Search CORE

3,452 research outputs found

Equivalence-based Security for Querying Encrypted Databases: Theory and Application to Privacy Policy Audits

Author: Basin D.
Basin D.
Bauer A.
Holt J. E.
Kelsey J.
Popa R. A.
Schneier B.
Song D. X.
Waters B. R.
Publication venue
Publication date: 01/01/2015
Field of study

Motivated by the problem of simultaneously preserving confidentiality and usability of data outsourced to third-party clouds, we present two different database encryption schemes that largely hide data but reveal enough information to support a wide-range of relational queries. We provide a security definition for database encryption that captures confidentiality based on a notion of equivalence of databases from the adversary's perspective. As a specific application, we adapt an existing algorithm for finding violations of privacy policies to run on logs encrypted under our schemes and observe low to moderate overheads.Comment: CCS 2015 paper technical report, in progres

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

MPG.PuRe

Data Provenance Inference in Machine Learning

Author: Li Xiang-Yang
Xu Mingxue
Publication venue
Publication date: 23/11/2022
Field of study

Unintended memorization of various information granularity has garnered academic attention in recent years, e.g. membership inference and property inference. How to inversely use this privacy leakage to facilitate real-world applications is a growing direction; the current efforts include dataset ownership inference and user auditing. Standing on the data lifecycle and ML model production, we propose an inference process named Data Provenance Inference, which is to infer the generation, collection or processing property of the ML training data, to assist ML developers in locating the training data gaps without maintaining strenuous metadata. We formularly define the data provenance and the data provenance inference task in ML training. Then we propose a novel inference strategy combining embedded-space multiple instance classification and shadow learning. Comprehensive evaluations cover language, visual and structured data in black-box and white-box settings, with diverse kinds of data provenance (i.e. business, county, movie, user). Our best inference accuracy achieves 98.96% in the white-box text model when "author" is the data provenance. The experimental results indicate that, in general, the inference performance positively correlated with the amount of reference data for inference, the depth and also the amount of the parameter of the accessed layer. Furthermore, we give a post-hoc statistical analysis of the data provenance definition to explain when our proposed method works well

arXiv.org e-Print Archive

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

Explanation-Based Auditing

Author: Fabbri Daniel
LeFevre Kristen
Publication venue
Publication date: 01/01/2011
Field of study

To comply with emerging privacy laws and regulations, it has become common for applications like electronic health records systems (EHRs) to collect access logs, which record each time a user (e.g., a hospital employee) accesses a piece of sensitive data (e.g., a patient record). Using the access log, it is easy to answer simple queries (e.g., Who accessed Alice's medical record?), but this often does not provide enough information. In addition to learning who accessed their medical records, patients will likely want to understand why each access occurred. In this paper, we introduce the problem of generating explanations for individual records in an access log. The problem is motivated by user-centric auditing applications, and it also provides a novel approach to misuse detection. We develop a framework for modeling explanations which is based on a fundamental observation: For certain classes of databases, including EHRs, the reason for most data accesses can be inferred from data stored elsewhere in the database. For example, if Alice has an appointment with Dr. Dave, this information is stored in the database, and it explains why Dr. Dave looked at Alice's record. Large numbers of data accesses can be explained using general forms called explanation templates. Rather than requiring an administrator to manually specify explanation templates, we propose a set of algorithms for automatically discovering frequent templates from the database (i.e., those that explain a large number of accesses). We also propose techniques for inferring collaborative user groups, which can be used to enhance the quality of the discovered explanations. Finally, we have evaluated our proposed techniques using an access log and data from the University of Michigan Health System. Our results demonstrate that in practice we can provide explanations for over 94% of data accesses in the log.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Advanced security infrastructures for grid education

Author: Sinnott R.O.
Stell A.J.
Watt J.
Publication venue
Publication date: 01/01/2006
Field of study

This paper describes the research conducted into advanced authorization infrastructures at the National e-Science Centre (NeSC) at the University of Glasgow and their application to support a teaching environment as part of the Dynamic Virtual Organisations in e-Science Education (DyVOSE) project. We outline the lessons learnt in teaching Grid computing and rolling out the associated security authorisation infrastructures, and describe our plans for a future, extended security infrastructure for dynamic establishment of inter-institutional virtual organisations (VO) in the education domain

CiteSeerX

Enlighten

University of Melbourne Institutional Repository

Advanced security infrastructures for grid education

Author: Sinnott R.O.
Stell A.J.
Watt J.
Publication venue
Publication date: 01/07/2006
Field of study

Enlighten