1,134 research outputs found

    An Effective Private Data storage and Retrieval System using Secret sharing scheme based on Secure Multi-party Computation

    Full text link
    Privacy of the outsourced data is one of the major challenge.Insecurity of the network environment and untrustworthiness of the service providers are obstacles of making the database as a service.Collection and storage of personally identifiable information is a major privacy concern.On-line public databases and resources pose a significant risk to user privacy, since a malicious database owner may monitor user queries and infer useful information about the customer.The challenge in data privacy is to share data with third-party and at the same time securing the valuable information from unauthorized access and use by third party.A Private Information Retrieval(PIR) scheme allows a user to query database while hiding the identity of the data retrieved.The naive solution for confidentiality is to encrypt data before outsourcing.Query execution,key management and statistical inference are major challenges in this case.The proposed system suggests a mechanism for secure storage and retrieval of private data using the secret sharing technique.The idea is to develop a mechanism to store private information with a highly available storage provider which could be accessed from anywhere using queries while hiding the actual data values from the storage provider.The private information retrieval system is implemented using Secure Multi-party Computation(SMC) technique which is based on secret sharing. Multi-party Computation enable parties to compute some joint function over their private inputs.The query results are obtained by performing a secure computation on the shares owned by the different servers.Comment: Data Science & Engineering (ICDSE), 2014 International Conference, CUSA

    Mining Privacy-Preserving Association Rules based on Parallel Processing in Cloud Computing

    Full text link
    With the onset of the Information Era and the rapid growth of information technology, ample space for processing and extracting data has opened up. However, privacy concerns may stifle expansion throughout this area. The challenge of reliable mining techniques when transactions disperse across sources is addressed in this study. This work looks at the prospect of creating a new set of three algorithms that can obtain maximum privacy, data utility, and time savings while doing so. This paper proposes a unique double encryption and Transaction Splitter approach to alter the database to optimize the data utility and confidentiality tradeoff in the preparation phase. This paper presents a customized apriori approach for the mining process, which does not examine the entire database to estimate the support for each attribute. Existing distributed data solutions have a high encryption complexity and an insufficient specification of many participants' properties. Proposed solutions provide increased privacy protection against a variety of attack models. Furthermore, in terms of communication cycles and processing complexity, it is much simpler and quicker. Proposed work tests on top of a realworld transaction database demonstrate that the aim of the proposed method is realistic

    Privacy-preserving data outsourcing in the cloud via semantic data splitting

    Full text link
    Even though cloud computing provides many intrinsic benefits, privacy concerns related to the lack of control over the storage and management of the outsourced data still prevent many customers from migrating to the cloud. Several privacy-protection mechanisms based on a prior encryption of the data to be outsourced have been proposed. Data encryption offers robust security, but at the cost of hampering the efficiency of the service and limiting the functionalities that can be applied over the (encrypted) data stored on cloud premises. Because both efficiency and functionality are crucial advantages of cloud computing, in this paper we aim at retaining them by proposing a privacy-protection mechanism that relies on splitting (clear) data, and on the distributed storage offered by the increasingly popular notion of multi-clouds. We propose a semantically-grounded data splitting mechanism that is able to automatically detect pieces of data that may cause privacy risks and split them on local premises, so that each chunk does not incur in those risks; then, chunks of clear data are independently stored into the separate locations of a multi-cloud, so that external entities cannot have access to the whole confidential data. Because partial data are stored in clear on cloud premises, outsourced functionalities are seamlessly and efficiently supported by just broadcasting queries to the different cloud locations. To enforce a robust privacy notion, our proposal relies on a privacy model that offers a priori privacy guarantees; to ensure its feasibility, we have designed heuristic algorithms that minimize the number of cloud storage locations we need; to show its potential and generality, we have applied it to the least structured and most challenging data type: plain textual documents

    PRESERVING PRIVACY IN DATA RELEASE

    Get PDF
    Data sharing and dissemination play a key role in our information society. Not only do they prove to be advantageous to the involved parties, but they can also be fruitful to the society at large (e.g., new treatments for rare diseases can be discovered based on real clinical trials shared by hospitals and pharmaceutical companies). The advancements in the Information and Communication Technology (ICT) make the process of releasing a data collection simpler than ever. The availability of novel computing paradigms, such as data outsourcing and cloud computing, make scalable, reliable and fast infrastructures a dream come true at reasonable costs. As a natural consequence of this scenario, data owners often rely on external storage servers for releasing their data collections, thus delegating the burden of data storage and management to the service provider. Unfortunately, the price to be paid when releasing a collection of data is in terms of unprecedented privacy risks. Data collections often include sensitive information, not intended for disclosure, that should be properly protected. The problem of protecting privacy in data release has been under the attention of the research and development communities for a long time. However, the richness of released data, the large number of available sources, and the emerging outsourcing/cloud scenarios raise novel problems, not addressed by traditional approaches, which need enhanced solutions. In this thesis, we define a comprehensive approach for protecting sensitive information when large collections of data are publicly or selectively released by their owners. In a nutshell, this requires protecting data explicitly included in the release, as well as protecting information not explicitly released but that could be exposed by the release, and ensuring that access to released data be allowed only to authorized parties according to the data owners\u2019 policies. More specifically, these three aspects translate to three requirements, addressed by this thesis, which can be summarized as follows. The first requirement is the protection of data explicitly included in a release. While intuitive, this requirement is complicated by the fact that privacy-enhancing techniques should not prevent recipients from performing legitimate analysis on the released data but, on the contrary, should ensure sufficient visibility over non sensitive information. We therefore propose a solution, based on a novel formulation of the fragmentation approach, that vertically fragments a data collection so to satisfy requirements for both information protection and visibility, and we complement it with an effective means for enriching the utility of the released data. The second requirement is the protection of data not explicitly included in a release. As a matter of fact, even a collection of non sensitive data might enable recipients to infer (possibly sensitive) information not explicitly disclosed but that somehow depends on the released information (e.g., the release of the treatment with which a patient is being cared can leak information about her disease). To address this requirement, starting from a real case study, we propose a solution for counteracting the inference of sensitive information that can be drawn observing peculiar value distributions in the released data collection. The third requirement is access control enforcement. Available solutions fall short for a variety of reasons. Traditional access control mechanisms are based on a reference monitor and do not fit outsourcing/cloud scenarios, since neither the data owner is willing, nor the cloud storage server is trusted, to enforce the access control policy. Recent solutions for access control enforcement in outsourcing scenarios assume outsourced data to be read-only and cannot easily manage (dynamic) write authorizations. We therefore propose an approach for efficiently supporting grant and revoke of write authorizations, building upon the selective encryption approach, and we also define a subscription-based authorization policy, to fit real-world scenarios where users pay for a service and access the resources made available during their subscriptions. The main contributions of this thesis can therefore be summarized as follows. With respect to the protection of data explicitly included in a release, our original results are: i) a novel modeling of the fragmentation problem; ii) an efficient technique for computing a fragmentation, based on reduced Ordered Binary Decision Diagrams (OBDDs) to formulate the conditions that a fragmentation must satisfy; iii) the computation of a minimal fragmentation not fragmenting data more than necessary, with the definition of both an exact and an heuristic algorithms, which provides faster computational time while well approximating the exact solutions; and iv) the definition of loose associations, a sanitized form of the sensitive associations broken by fragmentation that can be safely released, specifically extended to operate on arbitrary fragmentations. With respect to the protection of data not explicitly included in a release, our original results are: i) the definition of a novel and unresolved inference scenario, raised from a real case study where data items are incrementally released upon request; ii) the definition of several metrics to assess the inference exposure due to a data release, based upon the concepts of mutual information, Kullback-Leibler distance between distributions, Pearson\u2019s cumulative statistic, and Dixon\u2019s coefficient; and iii) the identification of a safe release with respect to the considered inference channel and the definition of the controls to be enforced to guarantee that no sensitive information be leaked releasing non sensitive data items. With respect to access control enforcement, our original results are: i) the management of dynamic write authorizations, by defining a solution based on selective encryption for efficiently and effectively supporting grant and revoke of write authorizations; ii) the definition of an effective technique to guarantee data integrity, so to allow the data owner and the users to verify that modifications to a resource have been produced only by authorized users; and iii) the modeling and enforcement of a subscription-based authorization policy, to support scenarios where both the set of users and the set of resources change frequently over time, and users\u2019 authorizations are based on their subscriptions

    Privacy by Design in Data Mining

    Get PDF
    Privacy is ever-growing concern in our society: the lack of reliable privacy safeguards in many current services and devices is the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, unless it is absolutely necessary. Thus, privacy is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving sensitive information. Many recent research works have focused on the study of privacy protection: some of these studies aim at individual privacy, i.e., the protection of sensitive individual data, while others aim at corporate privacy, i.e., the protection of strategic information at organization level. Unfortunately, it is in- creasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze complex data which describes human activities in great detail and resolution. As a result anonymization simply cannot be accomplished by de-identification. In the last few years, several techniques for creating anonymous or obfuscated versions of data sets have been proposed, which essentially aim to find an acceptable trade-off between data privacy on the one hand and data utility on the other. So far, the common result obtained is that no general method exists which is capable of both dealing with “generic personal data” and preserving “generic analytical results”. In this thesis we propose the design of technological frameworks to counter the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of data mining technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technol- ogy by design, so that the analysis incorporates the relevant privacy requirements from the start. Therefore, we propose the privacy-by-design paradigm that sheds a new light on the study of privacy protection: once specific assumptions are made about the sensitive data and the target mining queries that are to be answered with the data, it is conceivable to design a framework to: a) transform the source data into an anonymous version with a quantifiable privacy guarantee, and b) guarantee that the target mining queries can be answered correctly using the transformed data instead of the original ones. This thesis investigates on two new research issues which arise in modern Data Mining and Data Privacy: individual privacy protection in data publishing while preserving specific data mining analysis, and corporate privacy protection in data mining outsourcing

    Aggregating privatized medical data for secure querying applications

    Full text link
     This thesis analyses and examines the challenges of aggregation of sensitive data and data querying on aggregated data at cloud server. This thesis also delineates applications of aggregation of sensitive medical data in several application scenarios, and tests privatization techniques to assist in improving the strength of privacy and utility

    Loose associations to increase utility in data publishing

    Get PDF
    Data fragmentation has been proposed as a solution for protecting the confidentiality of sensitive associations when releasing data for publishing or external storage. To enrich the utility of data fragments, a recent approach has put forward the idea of complementing a pair of fragments with some (non precise, hence loose) information on the association between them. Starting from the observation that in presence of multiple fragments the publication of several independent associations between pairs of fragments can cause improper leakage of sensitive information, in this paper we extend loose associations to operate over an arbitrary number of fragments. We first illustrate how the publication of multiple loose associations between different pairs of fragments can potentially expose sensitive associations, and describe an approach for defining loose associations among an arbitrary set of fragments. We investigate how tuples in fragments can be grouped for producing loose associations so to increase the utility of queries executed over fragments. We then provide a heuristics for performing such a grouping and producing loose associations satisfying a given level of protection for sensitive associations, while achieving utility for queries over different fragments. We also illustrate the result of an extensive experimental effort over both synthetic and real datasets, which shows the efficiency and the enhanced utility provided by our proposal
    • …
    corecore