6,691 research outputs found
DESIGN AND DEVELOPMENT OF KEY REPRESENTATION AUDITING SCHEME FOR SECURE ONLINE AND DYNAMIC STATISTICAL DATABASES
A statistical database (SDB) publishes statistical queries (such as sum, average, count,
etc.) on subsets of records. Sometimes by stitching the answers of some statistics, a
malicious user (snooper) may be able to deduce confidential information about some
individuals. When a user submits a query to statistical database, the difficult problem
is how to decide whether the query is answerable or not; to make a decision, past
queries must be taken into account, which is called SDB auditing. One of the major
drawbacks of the auditing, however, is its excessive CPU time and storage
requirements to find and retrieve the relevant records from the SDB.
The key representation auditing scheme (KRAS) is proposed to guarantee the
security of online and dynamic SDBs. The core idea is to convert the original
database into a key representation database (KRDB), also this scheme involves
converting each new user query from a string representation into a key representation
query (KRQ) and storing it in the Audit Query table (AQ table). Three audit stages are
proposed to repel the attacks of the snooper to the confidentiality of the individuals.
Also, efficient algorithms for these stages are presented, namely the First Stage
Algorithm (FSA), the Second Stage Algorithm (SSA) and the Third Stage Algorithm
(TSA). These algorithms enable the key representation auditor (KRA) to conveniently
specify the illegal queries which could lead to disclosing the SDB.
A comparative study is made between the new scheme and the existing methods,
namely a cost estimation and a statistical analysis are performed, and it illustrates the
savings in block accesses (CPU time) and storage space that are attainable when a
KRDB is used. Finally, an implementation of the new scheme is performed and all the
components of the proposed system are discussed
Explanation-Based Auditing
To comply with emerging privacy laws and regulations, it has become common
for applications like electronic health records systems (EHRs) to collect
access logs, which record each time a user (e.g., a hospital employee) accesses
a piece of sensitive data (e.g., a patient record). Using the access log, it is
easy to answer simple queries (e.g., Who accessed Alice's medical record?), but
this often does not provide enough information. In addition to learning who
accessed their medical records, patients will likely want to understand why
each access occurred. In this paper, we introduce the problem of generating
explanations for individual records in an access log. The problem is motivated
by user-centric auditing applications, and it also provides a novel approach to
misuse detection. We develop a framework for modeling explanations which is
based on a fundamental observation: For certain classes of databases, including
EHRs, the reason for most data accesses can be inferred from data stored
elsewhere in the database. For example, if Alice has an appointment with Dr.
Dave, this information is stored in the database, and it explains why Dr. Dave
looked at Alice's record. Large numbers of data accesses can be explained using
general forms called explanation templates. Rather than requiring an
administrator to manually specify explanation templates, we propose a set of
algorithms for automatically discovering frequent templates from the database
(i.e., those that explain a large number of accesses). We also propose
techniques for inferring collaborative user groups, which can be used to
enhance the quality of the discovered explanations. Finally, we have evaluated
our proposed techniques using an access log and data from the University of
Michigan Health System. Our results demonstrate that in practice we can provide
explanations for over 94% of data accesses in the log.Comment: VLDB201
Audit-based Compliance Control (AC2) for EHR Systems
Traditionally, medical data is stored and processed using paper-based files. Recently, medical facilities have started to store, access and exchange medical data in digital form. The drivers for this change are mainly demands for cost reduction, and higher quality of health care. The main concerns when dealing with medical data are availability and confidentiality. Unavailability (even temporary) of medical data is expensive. Physicians may not be able to diagnose patients correctly, or they may have to repeat exams, adding to the overall costs of health care. In extreme cases availability of medical data can even be a matter of life or death. On the other hand, confidentiality of medical data is also important. Legislation requires medical facilities to observe the privacy of the patients, and states that patients have a final say on whether or not their medical data can be processed or not. Moreover, if physicians, or their EHR systems, are not trusted by the patients, for instance because of frequent privacy breaches, then patients may refuse to submit (correct) information, complicating the work of the physicians greatly. \ud
\ud
In traditional data protection systems, confidentiality and availability are conflicting requirements. The more data protection methods are applied to shield data from outsiders the more likely it becomes that authorized persons will not get access to the data in time. Consider for example, a password verification service that is temporarily not available, an access pass that someone forgot to bring, and so on. In this report we discuss a novel approach to data protection, Audit-based Compliance Control (AC2), and we argue that it is particularly suited for application in EHR systems. In AC2, a-priori access control is minimized to the mere authentication of users and objects, and their basic authorizations. More complex security procedures, such as checking user compliance to policies, are performed a-posteriori by using a formal and automated auditing mechanism. To support our claim we discuss legislation concerning the processing of health records, and we formalize a scenario involving medical personnel and a basic EHR system to show how AC2 can be used in practice. \ud
\ud
This report is based on previous work (Dekker & Etalle 2006) where we assessed the applicability of a-posteriori access control in a health care scenario. A more technically detailed article about AC2 recently appeared in the IJIS journal, where we focussed however on collaborative work environments (Cederquist, Corin, Dekker, Etalle, & Hartog, 2007). In this report we first provide background and related work before explaining the principal components of the AC2 framework. Moreover we model a detailed EHR case study to show its operation in practice. We conclude by discussing how this framework meets current trends in healthcare and by highlighting the main advantages and drawbacks of using an a-posteriori access control mechanism as opposed to more traditional access control mechanisms
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn
Preserving privacy of users is a key requirement of web-scale analytics and
reporting applications, and has witnessed a renewed focus in light of recent
data breaches and new regulations such as GDPR. We focus on the problem of
computing robust, reliable analytics in a privacy-preserving manner, while
satisfying product requirements. We present PriPeARL, a framework for
privacy-preserving analytics and reporting, inspired by differential privacy.
We describe the overall design and architecture, and the key modeling
components, focusing on the unique challenges associated with privacy,
coverage, utility, and consistency. We perform an experimental study in the
context of ads analytics and reporting at LinkedIn, thereby demonstrating the
tradeoffs between privacy and utility needs, and the applicability of
privacy-preserving mechanisms to real-world data. We also highlight the lessons
learned from the production deployment of our system at LinkedIn.Comment: Conference information: ACM International Conference on Information
and Knowledge Management (CIKM 2018
- …