7 research outputs found

    The Challenges of Effectively Anonymizing Network Data

    Get PDF
    The availability of realistic network data plays a significant role in fostering collaboration and ensuring U.S. technical leadership in network security research. Unfortunately, a host of technical, legal, policy, and privacy issues limit the ability of operators to produce datasets for information security testing. In an effort to help overcome these limitations, several data collection efforts (e.g., CRAWDAD[14], PREDICT [34]) have been established in the past few years. The key principle used in all of these efforts to assure low-risk, high-value data is that of trace anonymization—the process of sanitizing data before release so that potentially sensitive information cannot be extracted

    Preserving Both Privacy and Utility in Network Trace Anonymization

    Full text link
    As network security monitoring grows more sophisticated, there is an increasing need for outsourcing such tasks to third-party analysts. However, organizations are usually reluctant to share their network traces due to privacy concerns over sensitive information, e.g., network and system configuration, which may potentially be exploited for attacks. In cases where data owners are convinced to share their network traces, the data are typically subjected to certain anonymization techniques, e.g., CryptoPAn, which replaces real IP addresses with prefix-preserving pseudonyms. However, most such techniques either are vulnerable to adversaries with prior knowledge about some network flows in the traces, or require heavy data sanitization or perturbation, both of which may result in a significant loss of data utility. In this paper, we aim to preserve both privacy and utility through shifting the trade-off from between privacy and utility to between privacy and computational cost. The key idea is for the analysts to generate and analyze multiple anonymized views of the original network traces; those views are designed to be sufficiently indistinguishable even to adversaries armed with prior knowledge, which preserves the privacy, whereas one of the views will yield true analysis results privately retrieved by the data owner, which preserves the utility. We present the general approach and instantiate it based on CryptoPAn. We formally analyze the privacy of our solution and experimentally evaluate it using real network traces provided by a major ISP. The results show that our approach can significantly reduce the level of information leakage (e.g., less than 1\% of the information leaked by CryptoPAn) with comparable utility

    Reference models for network trace anonymization

    Get PDF
    Network security research can benefit greatly from testing environments that are capable of generating realistic, repeatable and configurable background traffic. In order to conduct network security experiments on systems such as Intrusion Detection Systems and Intrusion Prevention Systems, researchers require isolated testbeds capable of recreating actual network environments, complete with infrastructure and traffic details. Unfortunately, due to privacy and flexibility concerns, actual network traffic is rarely shared by organizations as sensitive information, such as IP addresses, device identity and behavioral information can be inferred from the traffic. Trace data anonymization is one solution to this problem. The research community has responded to this sanitization problem with anonymization tools that aim to remove sensitive information from network traces, and attacks on anonymized traces that aim to evaluate the efficacy of the anonymization schemes. However there is continued lack of a comprehensive model that distills all elements of the sanitization problem in to a functional reference model.;In this thesis we offer such a comprehensive functional reference model that identifies and binds together all the entities required to formulate the problem of network data anonymization. We build a new information flow model that illustrates the overly optimistic nature of inference attacks on anonymized traces. We also provide a probabilistic interpretation of the information model and develop a privacy metric for anonymized traces. Finally, we develop the architecture for a highly configurable, multi-layer network trace collection and sanitization tool. In addition to addressing privacy and flexibility concerns, our architecture allows for uniformity of anonymization and ease of data aggregation

    Building Cross-Cluster Security Models for Edge-Core Environments Involving Multiple Kubernetes Clusters

    Get PDF
    With the emergence of 5G networks and their large scale applications such as IoT and autonomous vehicles, telecom operators are increasingly offloading the computation closer to customers (i.e., on the edge). Such edge-core environments usually involve multiple Kubernetes clusters potentially owned by different providers. Confidentiality and privacy concerns could prevent those providers from sharing data freely with each other, which makes it challenging to perform common security tasks such as security verification and attack/anomaly detection across different clusters. In this work, we propose CCSM, a solution for building cross-cluster security models to enable various security analyses, while preserving confidentiality and privacy for each cluster. We design a six-step methodology to model both the cross-cluster communication and cross-cluster event dependency, and we apply those models to different security use cases. We implement our solution based on a 5G edge-core environment that involves multiple Kubernetes clusters, and our experi�mental results demonstrate its efficiency (e.g., less than 8 s of processing time for a model with 3,600 edges and nodes) and accuracy (e.g., more than 96% for cross-cluster event prediction

    Protecting Audit Data using Segmentation-Based Anonymization for Multi-Tenant Cloud Auditing (SEGGUARD)

    Get PDF
    With the rise of security concerns regarding cloud computing, the importance of security auditing, conducted either in-house or by a third party, has become evident more than ever. However, the input data required for auditing a multi-tenant cloud environment typically contains sensitive information, such as the topology of the underlying cloud infrastructure. Additionally, audit results intended for one tenant may unexpectedly reveal private information, such as unpatched security flaws, about other tenants. How to anonymize audit data and results in order to prevent such information leakage is a novel challenge that has received little attention. Directly applying most existing anonymization techniques to such a context would either lead to insufficient protection or render the data unsuitable for auditing. In this thesis, we propose SegGuard, a novel anonymization approach that protects the sensitive information in both the audit data and auditing results, while assuring the data utility for effective auditing. Specifically, SegGuard prevents cross-tenant information leakage through per-tenant encryption, and it prevents information leakage to auditors through an innovative way of applying property-preserving anonymization. We apply SegGuard on audit data collected from an OpenStack cloud, and evaluate its effectiveness and efficiency using both synthetic and real data. Our experimental results demonstrate that SegGuard can reduce information leakage to a negligible level (e.g., less than 1% for an adversary with 50% pre-knowledge) with a practical response time (e.g., 62 seconds to anonymize a cloud virtual infrastructure with 25,000 virtual machines)

    Anonymization of Event Logs for Network Security Monitoring

    Get PDF
    A managed security service provider (MSSP) must collect security event logs from their customers’ network for monitoring and cybersecurity protection. These logs need to be processed by the MSSP before displaying it to the security operation center (SOC) analysts. The employees generate event logs during their working hours at the customers’ site. One challenge is that collected event logs consist of personally identifiable information (PII) data; visible in clear text to the SOC analysts or any user with access to the SIEM platform. We explore how pseudonymization can be applied to security event logs to help protect individuals’ identities from the SOC analysts while preserving data utility when possible. We compare the impact of using different pseudonymization functions on sensitive information or PII. Non-deterministic methods provide higher level of privacy but reduced utility of the data. Our contribution in this thesis is threefold. First, we study available architectures with different threat models, including their strengths and weaknesses. Second, we study pseudonymization functions and their application to PII fields; we benchmark them individually, as well as in our experimental platform. Last, we obtain valuable feedbacks and lessons from SOC analysts based on their experience. Existing works[43, 44, 48, 39] are generally restricting to the anonymization of the IP traces, which is only one part of the SOC analysts’ investigation of PCAP files inspection. In one of the closest work[47], the authors provide useful, practical anonymization methods for the IP addresses, ports, and raw logs

    Novel Approaches to Preserving Utility in Privacy Enhancing Technologies

    Get PDF
    Significant amount of individual information are being collected and analyzed today through a wide variety of applications across different industries. While pursuing better utility by discovering knowledge from the data, an individual’s privacy may be compromised during an analysis: corporate networks monitor their online behavior, advertising companies collect and share their private information, and cybercriminals cause financial damages through security breaches. To this end, the data typically goes under certain anonymization techniques, e.g., CryptoPAn [Computer Networks’04], which replaces real IP addresses with prefix-preserving pseudonyms, or Differentially Private (DP) [ICALP’06] techniques which modify the answer to a query by adding a zero-mean noise distributed according to, e.g., a Laplace distribution. Unfortunately, most such techniques either are vulnerable to adversaries with prior knowledge, e.g., some network flows in the data, or require heavy data sanitization or perturbation, both of which may result in a significant loss of data utility. Therefore, the fundamental trade-off between privacy and utility (i.e., analysis accuracy) has attracted significant attention in various settings [ICALP’06, ACM CCS’14]. In line with this track of research, in this dissertation we aim to build utility-maximized and privacy-preserving tools for Internet communications. Such tools can be employed not only by dissidents and whistleblowers, but also by ordinary Internet users on a daily basis. To this end, we combine the development of practical systems with rigorous theoretical analysis, and incorporate techniques from various disciplines such as computer networking, cryptography, and statistical analysis. During the research, we proposed three different frameworks in some well-known settings outlined in the following. First, we propose The Multi-view Approach to preserve both privacy and utility in network trace anonymization, Second, The R2DP Approach which is a novel technique on differentially private mechanism design with maximized utility, and Third, The DPOD Approach that is a novel framework on privacy preserving Anomaly detection in the outsourcing setting
    corecore