34,454 research outputs found
Privacy-preserving scoring of tree ensembles : a novel framework for AI in healthcare
Machine Learning (ML) techniques now impact a wide variety of domains. Highly regulated industries such as healthcare and finance have stringent compliance and data governance policies around data sharing. Advances in secure multiparty computation (SMC) for privacy-preserving machine learning (PPML) can help transform these regulated industries by allowing ML computations over encrypted data with personally identifiable information (PII). Yet very little of SMC-based PPML has been put into practice so far. In this paper we present the very first framework for privacy-preserving classification of tree ensembles with application in healthcare. We first describe the underlying cryptographic protocols that enable a healthcare organization to send encrypted data securely to a ML scoring service and obtain encrypted class labels without the scoring service actually seeing that input in the clear. We then describe the deployment challenges we solved to integrate these protocols in a cloud based scalable risk-prediction platform with multiple ML models for healthcare AI. Included are system internals, and evaluations of our deployment for supporting physicians to drive better clinical outcomes in an accurate, scalable, and provably secure manner. To the best of our knowledge, this is the first such applied framework with SMC-based privacy-preserving machine learning for healthcare
State of The Art and Hot Aspects in Cloud Data Storage Security
Along with the evolution of cloud computing and cloud storage towards matu-
rity, researchers have analyzed an increasing range of cloud computing security
aspects, data security being an important topic in this area. In this paper, we
examine the state of the art in cloud storage security through an overview of
selected peer reviewed publications. We address the question of defining cloud
storage security and its different aspects, as well as enumerate the main vec-
tors of attack on cloud storage. The reviewed papers present techniques for key
management and controlled disclosure of encrypted data in cloud storage, while
novel ideas regarding secure operations on encrypted data and methods for pro-
tection of data in fully virtualized environments provide a glimpse of the toolbox
available for securing cloud storage. Finally, new challenges such as emergent
government regulation call for solutions to problems that did not receive enough
attention in earlier stages of cloud computing, such as for example geographical
location of data. The methods presented in the papers selected for this review
represent only a small fraction of the wide research effort within cloud storage
security. Nevertheless, they serve as an indication of the diversity of problems
that are being addressed
The RFID PIA – developed by industry, agreed by regulators
This chapter discusses the privacy impact assessment (PIA) framework endorsed
by the European Commission on February 11th, 2011. This PIA, the first to receive the
Commission's endorsement, was developed to deal with privacy challenges associated with
the deployment of radio frequency identification (RFID) technology, a key building block of
the Internet of Things. The goal of this chapter is to present the methodology and key
constructs of the RFID PIA Framework in more detail than was possible in the official text.
RFID operators can use this article as a support document when they conduct PIAs and need
to interpret the PIA Framework. The chapter begins with a history of why and how the PIA
Framework for RFID came about. It then proceeds with a description of the endorsed PIA
process for RFID applications and explains in detail how this process is supposed to function.
It provides examples discussed during the development of the PIA Framework. These
examples reflect the rationale behind and evolution of the text's methods and definitions. The
chapter also provides insight into the stakeholder debates and compromises that have
important implications for PIAs in general.Series: Working Papers on Information Systems, Information Business and Operation
Microdata protection through approximate microaggregation
Microdata protection is a hot topic in the field of Statistical Disclosure Control, which has gained special interest after the disclosure of 658000 queries by the America Online (AOL) search engine in August 2006. Many algorithms, methods and properties have been proposed to deal with microdata disclosure. One of the emerging
concepts in microdata protection is k-anonymity, introduced by Samarati and Sweeney. k-anonymity provides a simple and efficient approach to protect private individual information and is gaining increasing popularity. k-anonymity requires that every record in the microdata table released be indistinguishably related to no fewer than k respondents.
In this paper, we apply the concept of entropy to propose a distance metric to evaluate the amount of mutual information among records in microdata, and propose a method of constructing dependency tree to find the key attributes, which we then use to process approximate microaggregation. Further, we adopt this new microaggregation technique to study -anonymity problem, and an efficient algorithm is developed. Experimental results show that the proposed microaggregation technique is efficient and effective in the terms of running time and information loss
A Survey on Wireless Sensor Network Security
Wireless sensor networks (WSNs) have recently attracted a lot of interest in
the research community due their wide range of applications. Due to distributed
nature of these networks and their deployment in remote areas, these networks
are vulnerable to numerous security threats that can adversely affect their
proper functioning. This problem is more critical if the network is deployed
for some mission-critical applications such as in a tactical battlefield.
Random failure of nodes is also very likely in real-life deployment scenarios.
Due to resource constraints in the sensor nodes, traditional security
mechanisms with large overhead of computation and communication are infeasible
in WSNs. Security in sensor networks is, therefore, a particularly challenging
task. This paper discusses the current state of the art in security mechanisms
for WSNs. Various types of attacks are discussed and their countermeasures
presented. A brief discussion on the future direction of research in WSN
security is also included.Comment: 24 pages, 4 figures, 2 table
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
- …