6,691 research outputs found

    DESIGN AND DEVELOPMENT OF KEY REPRESENTATION AUDITING SCHEME FOR SECURE ONLINE AND DYNAMIC STATISTICAL DATABASES

    Get PDF
    A statistical database (SDB) publishes statistical queries (such as sum, average, count, etc.) on subsets of records. Sometimes by stitching the answers of some statistics, a malicious user (snooper) may be able to deduce confidential information about some individuals. When a user submits a query to statistical database, the difficult problem is how to decide whether the query is answerable or not; to make a decision, past queries must be taken into account, which is called SDB auditing. One of the major drawbacks of the auditing, however, is its excessive CPU time and storage requirements to find and retrieve the relevant records from the SDB. The key representation auditing scheme (KRAS) is proposed to guarantee the security of online and dynamic SDBs. The core idea is to convert the original database into a key representation database (KRDB), also this scheme involves converting each new user query from a string representation into a key representation query (KRQ) and storing it in the Audit Query table (AQ table). Three audit stages are proposed to repel the attacks of the snooper to the confidentiality of the individuals. Also, efficient algorithms for these stages are presented, namely the First Stage Algorithm (FSA), the Second Stage Algorithm (SSA) and the Third Stage Algorithm (TSA). These algorithms enable the key representation auditor (KRA) to conveniently specify the illegal queries which could lead to disclosing the SDB. A comparative study is made between the new scheme and the existing methods, namely a cost estimation and a statistical analysis are performed, and it illustrates the savings in block accesses (CPU time) and storage space that are attainable when a KRDB is used. Finally, an implementation of the new scheme is performed and all the components of the proposed system are discussed

    Explanation-Based Auditing

    Full text link
    To comply with emerging privacy laws and regulations, it has become common for applications like electronic health records systems (EHRs) to collect access logs, which record each time a user (e.g., a hospital employee) accesses a piece of sensitive data (e.g., a patient record). Using the access log, it is easy to answer simple queries (e.g., Who accessed Alice's medical record?), but this often does not provide enough information. In addition to learning who accessed their medical records, patients will likely want to understand why each access occurred. In this paper, we introduce the problem of generating explanations for individual records in an access log. The problem is motivated by user-centric auditing applications, and it also provides a novel approach to misuse detection. We develop a framework for modeling explanations which is based on a fundamental observation: For certain classes of databases, including EHRs, the reason for most data accesses can be inferred from data stored elsewhere in the database. For example, if Alice has an appointment with Dr. Dave, this information is stored in the database, and it explains why Dr. Dave looked at Alice's record. Large numbers of data accesses can be explained using general forms called explanation templates. Rather than requiring an administrator to manually specify explanation templates, we propose a set of algorithms for automatically discovering frequent templates from the database (i.e., those that explain a large number of accesses). We also propose techniques for inferring collaborative user groups, which can be used to enhance the quality of the discovered explanations. Finally, we have evaluated our proposed techniques using an access log and data from the University of Michigan Health System. Our results demonstrate that in practice we can provide explanations for over 94% of data accesses in the log.Comment: VLDB201

    Audit-based Compliance Control (AC2) for EHR Systems

    Get PDF
    Traditionally, medical data is stored and processed using paper-based files. Recently, medical facilities have started to store, access and exchange medical data in digital form. The drivers for this change are mainly demands for cost reduction, and higher quality of health care. The main concerns when dealing with medical data are availability and confidentiality. Unavailability (even temporary) of medical data is expensive. Physicians may not be able to diagnose patients correctly, or they may have to repeat exams, adding to the overall costs of health care. In extreme cases availability of medical data can even be a matter of life or death. On the other hand, confidentiality of medical data is also important. Legislation requires medical facilities to observe the privacy of the patients, and states that patients have a final say on whether or not their medical data can be processed or not. Moreover, if physicians, or their EHR systems, are not trusted by the patients, for instance because of frequent privacy breaches, then patients may refuse to submit (correct) information, complicating the work of the physicians greatly. \ud \ud In traditional data protection systems, confidentiality and availability are conflicting requirements. The more data protection methods are applied to shield data from outsiders the more likely it becomes that authorized persons will not get access to the data in time. Consider for example, a password verification service that is temporarily not available, an access pass that someone forgot to bring, and so on. In this report we discuss a novel approach to data protection, Audit-based Compliance Control (AC2), and we argue that it is particularly suited for application in EHR systems. In AC2, a-priori access control is minimized to the mere authentication of users and objects, and their basic authorizations. More complex security procedures, such as checking user compliance to policies, are performed a-posteriori by using a formal and automated auditing mechanism. To support our claim we discuss legislation concerning the processing of health records, and we formalize a scenario involving medical personnel and a basic EHR system to show how AC2 can be used in practice. \ud \ud This report is based on previous work (Dekker & Etalle 2006) where we assessed the applicability of a-posteriori access control in a health care scenario. A more technically detailed article about AC2 recently appeared in the IJIS journal, where we focussed however on collaborative work environments (Cederquist, Corin, Dekker, Etalle, & Hartog, 2007). In this report we first provide background and related work before explaining the principal components of the AC2 framework. Moreover we model a detailed EHR case study to show its operation in practice. We conclude by discussing how this framework meets current trends in healthcare and by highlighting the main advantages and drawbacks of using an a-posteriori access control mechanism as opposed to more traditional access control mechanisms

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Get PDF
    Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

    PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn

    Full text link
    Preserving privacy of users is a key requirement of web-scale analytics and reporting applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. We focus on the problem of computing robust, reliable analytics in a privacy-preserving manner, while satisfying product requirements. We present PriPeARL, a framework for privacy-preserving analytics and reporting, inspired by differential privacy. We describe the overall design and architecture, and the key modeling components, focusing on the unique challenges associated with privacy, coverage, utility, and consistency. We perform an experimental study in the context of ads analytics and reporting at LinkedIn, thereby demonstrating the tradeoffs between privacy and utility needs, and the applicability of privacy-preserving mechanisms to real-world data. We also highlight the lessons learned from the production deployment of our system at LinkedIn.Comment: Conference information: ACM International Conference on Information and Knowledge Management (CIKM 2018
    • …
    corecore