81 research outputs found
Preserving Both Privacy and Utility in Network Trace Anonymization
As network security monitoring grows more sophisticated, there is an
increasing need for outsourcing such tasks to third-party analysts. However,
organizations are usually reluctant to share their network traces due to
privacy concerns over sensitive information, e.g., network and system
configuration, which may potentially be exploited for attacks. In cases where
data owners are convinced to share their network traces, the data are typically
subjected to certain anonymization techniques, e.g., CryptoPAn, which replaces
real IP addresses with prefix-preserving pseudonyms. However, most such
techniques either are vulnerable to adversaries with prior knowledge about some
network flows in the traces, or require heavy data sanitization or
perturbation, both of which may result in a significant loss of data utility.
In this paper, we aim to preserve both privacy and utility through shifting the
trade-off from between privacy and utility to between privacy and computational
cost. The key idea is for the analysts to generate and analyze multiple
anonymized views of the original network traces; those views are designed to be
sufficiently indistinguishable even to adversaries armed with prior knowledge,
which preserves the privacy, whereas one of the views will yield true analysis
results privately retrieved by the data owner, which preserves the utility. We
present the general approach and instantiate it based on CryptoPAn. We formally
analyze the privacy of our solution and experimentally evaluate it using real
network traces provided by a major ISP. The results show that our approach can
significantly reduce the level of information leakage (e.g., less than 1\% of
the information leaked by CryptoPAn) with comparable utility
FLAIM: A Multi-level Anonymization Framework for Computer and Network Logs
FLAIM (Framework for Log Anonymization and Information Management) addresses
two important needs not well addressed by current log anonymizers. First, it is
extremely modular and not tied to the specific log being anonymized. Second, it
supports multi-level anonymization, allowing system administrators to make
fine-grained trade-offs between information loss and privacy/security concerns.
In this paper, we examine anonymization solutions to date and note the above
limitations in each. We further describe how FLAIM addresses these problems,
and we describe FLAIM's architecture and features in detail.Comment: 16 pages, 4 figures, in submission to USENIX Lis
Catch, Clean, and Release: A Survey of Obstacles and Opportunities for Network Trace Sanitization
Network researchers benefit tremendously from access to traces of production networks, and several repositories of such network traces exist. By their very nature, these traces capture sensitive business and personal activity. Furthermore, network traces contain significant operational information about the target network, such as its structure, identity of the network provider, or addresses of important servers. To protect private or proprietary information, researchers must “sanitize” a trace before sharing it. \par In this chapter, we survey the growing body of research that addresses the risks, methods, and evaluation of network trace sanitization. Research on the risks of network trace sanitization attempts to extract information from published network traces, while research on sanitization methods investigates approaches that may protect against such attacks. Although researchers have recently proposed both quantitative and qualitative methods to evaluate the effectiveness of sanitization methods, such work has several shortcomings, some of which we highlight in a discussion of open problems. Sanitizing a network trace, however challenging, remains an important method for advancing network–based research
The devil and packet trace anonymization,”
ABSTRACT Releasing network measurement data-including packet tracesto the research community is a virtuous activity that promotes solid research. However, in practice, releasing anonymized packet traces for public use entails many more vexing considerations than just the usual notion of how to scramble IP addresses to preserve privacy. Publishing traces requires carefully balancing the security needs of the organization providing the trace with the research usefulness of the anonymized trace. In this paper we recount our experiences in (i) securing permission from a large site to release packet header traces of the site's internal traffic, (ii) implementing the corresponding anonymization policy, and (iii) validating its correctness. We present a general tool, tcpmkpub, for anonymizing traces, discuss the process used to determine the particular anonymization policy, and describe the use of meta-data accompanying the traces to provide insight into features that have been obfuscated by anonymization
Privacy-safe network trace sharing via secure queries
Privacy concerns relating to sharing network traces have traditionally been handled via sanitization, which includes removal of sensitive data and IP address anonymization. We argue that sanitization is a poor solution for data sharing that offers insufficient research utility to users and poor privacy guarantees to data providers. We claim that a better balance in the utility/privacy tradeoff, inherent to network data sharing, can be achieved via a new paradigm we propose: secure queries. In this paradigm, a data owner publishes a query language and an online portal, allowing researchers to submit sets of queries to be run on data. Only certain operations are allowed on certain data fields, and in specific contexts. Query restriction is achieved via the provider’s privacy policy, and enforced by the language’s interpreter. Query results, returned to researchers, consist of aggregate information such as counts, histograms, distributions, etc. and not of individual packets. We discuss why secure queries provide higher privacy guarantees and higher research utility than sanitization, and present a design of the secure query language and a privacy policy
The devil and packet trace anonymization,”
ABSTRACT Releasing network measurement data-including packet traces-to the research community is a virtuous activity that promotes solid research. However, in practice, releasing anonymized packet traces for public use entails many more vexing considerations than just the usual notion of how to scramble IP addresses to preserve privacy. Publishing traces requires carefully balancing the security needs of the organization providing the trace with the research usefulness of the anonymized trace. In this paper we recount our experiences in (i) securing permission from a large site to release packet header traces of the site's internal traffic, (ii) implementing the corresponding anonymization policy, and (iii) validating its correctness. We present a general tool, tcpmkpub, for anonymizing traces, discuss the process used to determine the particular anonymization policy, and describe the use of metadata accompanying the traces to provide insight into features that have been obfuscated by anonymization
Reference models for network trace anonymization
Network security research can benefit greatly from testing environments that are capable of generating realistic, repeatable and configurable background traffic. In order to conduct network security experiments on systems such as Intrusion Detection Systems and Intrusion Prevention Systems, researchers require isolated testbeds capable of recreating actual network environments, complete with infrastructure and traffic details. Unfortunately, due to privacy and flexibility concerns, actual network traffic is rarely shared by organizations as sensitive information, such as IP addresses, device identity and behavioral information can be inferred from the traffic. Trace data anonymization is one solution to this problem. The research community has responded to this sanitization problem with anonymization tools that aim to remove sensitive information from network traces, and attacks on anonymized traces that aim to evaluate the efficacy of the anonymization schemes. However there is continued lack of a comprehensive model that distills all elements of the sanitization problem in to a functional reference model.;In this thesis we offer such a comprehensive functional reference model that identifies and binds together all the entities required to formulate the problem of network data anonymization. We build a new information flow model that illustrates the overly optimistic nature of inference attacks on anonymized traces. We also provide a probabilistic interpretation of the information model and develop a privacy metric for anonymized traces. Finally, we develop the architecture for a highly configurable, multi-layer network trace collection and sanitization tool. In addition to addressing privacy and flexibility concerns, our architecture allows for uniformity of anonymization and ease of data aggregation
- …