7,953 research outputs found

    The Data Surveillance State in Europe and the United States

    Get PDF

    The Data Surveillance State in Europe and the United States

    Get PDF

    Interestingness measure on privacy preserved data with horizontal partitioning

    Get PDF
    Association rule mining is a process of finding the frequent item sets based on the interestingness measure. The major challenge exists when performing the association of the data where privacy preservation is emphasized. The actual transaction data provides the evident to calculate the parameters for defining the association rules. In this paper, a solution is proposed to find one such parameter i.e. support count for item sets on the non transparent data, in other words the transaction data is not disclosed. The privacy preservation is ensured by transferring the x-anonymous records for every transaction record. All the anonymous set of actual transaction record perceives high generalized values. The clients process the anonymous set of every transaction record to arrive at high abstract values and these generalized values are used for support calculation. More the number of anonymous records, more the privacy of data is amplified. In experimental results it is shown that privacy is ensured with more number of formatted transactions

    Privacy Preservation by Disassociation

    Full text link
    In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniquesa) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.Comment: VLDB201

    BALANCING PRIVACY, PRECISION AND PERFORMANCE IN DISTRIBUTED SYSTEMS

    Get PDF
    Privacy, Precision, and Performance (3Ps) are three fundamental design objectives in distributed systems. However, these properties tend to compete with one another and are not considered absolute properties or functions. They must be defined and justified in terms of a system, its resources, stakeholder concerns, and the security threat model. To date, distributed systems research has only considered the trade-offs of balancing privacy, precision, and performance in a pairwise fashion. However, this dissertation formally explores the space of trade-offs among all 3Ps by examining three representative classes of distributed systems, namely Wireless Sensor Networks (WSNs), cloud systems, and Data Stream Management Systems (DSMSs). These representative systems support large part of the modern and mission-critical distributed systems. WSNs are real-time systems characterized by unreliable network interconnections and highly constrained computational and power resources. The dissertation proposes a privacy-preserving in-network aggregation protocol for WSNs demonstrating that the 3Ps could be navigated by adopting the appropriate algorithms and cryptographic techniques that are not prohibitively expensive. Next, the dissertation highlights the privacy and precision issues that arise in cloud databases due to the eventual consistency models of the cloud. To address these issues, consistency enforcement techniques across cloud servers are proposed and the trade-offs between 3Ps are discussed to help guide cloud database users on how to balance these properties. Lastly, the 3Ps properties are examined in DSMSs which are characterized by high volumes of unbounded input data streams and strict real-time processing constraints. Within this system, the 3Ps are balanced through a proposed simple and efficient technique that applies access control policies over shared operator networks to achieve privacy and precision without sacrificing the systems performance. Despite that in this dissertation, it was shown that, with the right set of protocols and algorithms, the desirable 3P properties can co-exist in a balanced way in well-established distributed systems, this dissertation is promoting the use of the new 3Ps-by-design concept. This concept is meant to encourage distributed systems designers to proactively consider the interplay among the 3Ps from the initial stages of the systems design lifecycle rather than identifying them as add-on properties to systems

    Avoiding disclosure of individually identifiable health information: a literature review

    Get PDF
    Achieving data and information dissemination without arming anyone is a central task of any entity in charge of collecting data. In this article, the authors examine the literature on data and statistical confidentiality. Rather than comparing the theoretical properties of specific methods, they emphasize the main themes that emerge from the ongoing discussion among scientists regarding how best to achieve the appropriate balance between data protection, data utility, and data dissemination. They cover the literature on de-identification and reidentification methods with emphasis on health care data. The authors also discuss the benefits and limitations for the most common access methods. Although there is abundant theoretical and empirical research, their review reveals lack of consensus on fundamental questions for empirical practice: How to assess disclosure risk, how to choose among disclosure methods, how to assess reidentification risk, and how to measure utility loss.public use files, disclosure avoidance, reidentification, de-identification, data utility

    Cross-border Access to Electronic Data through Judicial Cooperation in Criminal Matters. State of the art and latest developments in the EU and the US. CEPS Liberty and Security in Europe Papers No. 2018-07, November 2018

    Get PDF
    In the digital age, access to data sought in the framework of a criminal investigation often entails the exercise of prosecuting powers over individuals and material that fall under another jurisdiction. Mutual legal assistance treaties, and the European Investigation Order allow for the lawful collection of electronic information in cross-border proceedings. These instruments rely on formal judicial cooperation between competent authorities in the different countries concerned by the investigative measure. By subjecting foreign actors’ requests for data to domestic independent judicial scrutiny, they guarantee that the information sought during an investigation is lawfully obtained and admissible in court. At the same time, pressure is mounting within the EU and in the US to allow law enforcement authorities’ access to data outside existing judicial cooperation channels. Initiatives such as the European Commission’s proposals on electronic evidence and the CLOUD Act in the US foster a model of direct private–public crossborder cooperation under which service providers receive, assess and respond directly to a foreign law enforcement order to produce or preserve electronic information. This paper scrutinises these recent EU and US initiatives in light of the fundamental rights standards, rule of law touchstones, and secondary norms that, in the EU legal system, must be observed to ensure the lawful collection and exchange of data for criminal justice purposes. A series of doubts are raised as to the Commission e-evidence proposal and the CLOUD Act’s compatibility with the legality, necessity and proportionality benchmarks provided under EU primary and secondary law

    Extending Complex Event Processing for Advanced Applications

    Get PDF
    Recently numerous emerging applications, ranging from on-line financial transactions, RFID based supply chain management, traffic monitoring to real-time object monitoring, generate high-volume event streams. To meet the needs of processing event data streams in real-time, Complex Event Processing technology (CEP) has been developed with the focus on detecting occurrences of particular composite patterns of events. By analyzing and constructing several real-world CEP applications, we found that CEP needs to be extended with advanced services beyond detecting pattern queries. We summarize these emerging needs in three orthogonal directions. First, for applications which require access to both streaming and stored data, we need to provide a clear semantics and efficient schedulers in the face of concurrent access and failures. Second, when a CEP system is deployed in a sensitive environment such as health care, we wish to mitigate possible privacy leaks. Third, when input events do not carry the identification of the object being monitored, we need to infer the probabilistic identification of events before feed them to a CEP engine. Therefore this dissertation discusses the construction of a framework for extending CEP to support these critical services. First, existing CEP technology is limited in its capability of reacting to opportunities and risks detected by pattern queries. We propose to tackle this unsolved problem by embedding active rule support within the CEP engine. The main challenge is to handle interactions between queries and reactions to queries in the high-volume stream execution. We hence introduce a novel stream-oriented transactional model along with a family of stream transaction scheduling algorithms that ensure the correctness of concurrent stream execution. And then we demonstrate the proposed technology by applying it to a real-world healthcare system and evaluate the stream transaction scheduling algorithms extensively using real-world workload. Second, we are the first to study the privacy implications of CEP systems. Specifically we consider how to suppress events on a stream to reduce the disclosure of sensitive patterns, while ensuring that nonsensitive patterns continue to be reported by the CEP engine. We formally define the problem of utility-maximizing event suppression for privacy preservation. We then design a suite of real-time solutions that eliminate private pattern matches while maximizing the overall utility. Our first solution optimally solves the problem at the event-type level. The second solution, at event-instance level, further optimizes the event-type level solution by exploiting runtime event distributions using advanced pattern match cardinality estimation techniques. Our experimental evaluation over both real-world and synthetic event streams shows that our algorithms are effective in maximizing utility yet still efficient enough to offer near real time system responsiveness. Third, we observe that in many real-world object monitoring applications where the CEP technology is adopted, not all sensed events carry the identification of the object whose action they report on, so called €œnon-ID-ed€� events. Such non-ID-ed events prevent us from performing object-based analytics, such as tracking, alerting and pattern matching. We propose a probabilistic inference framework to tackle this problem by inferring the missing object identification associated with an event. Specifically, as a foundation we design a time-varying graphic model to capture correspondences between sensed events and objects. Upon this model, we elaborate how to adapt the state-of-the-art Forward-backward inference algorithm to continuously infer probabilistic identifications for non-ID-ed events. More important, we propose a suite of strategies for optimizing the performance of inference. Our experimental results, using large-volume streams of a real-world health care application, demonstrate the accuracy, efficiency, and scalability of the proposed technology
    • …
    corecore