Search CORE

7 research outputs found

Spectral anonymization of data

Author: Lasko Thomas A. (Thomas Anton), 1965-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 87-96).Data anonymization is the process of conditioning a dataset such that no sensitive information can be learned about any specific individual, but valid scientific analysis can nevertheless be performed on it. It is not sufficient to simply remove identifying information because the remaining data may be enough to infer the individual source of the record (a reidentification disclosure) or to otherwise learn sensitive information about a person (a predictive disclosure). The only known way to prevent these disclosures is to remove additional information from the dataset. Dozens of anonymization methods have been proposed over the past few decades; most work by perturbing or suppressing variable values. None have been successful at simultaneously providing perfect privacy protection and allowing perfectly accurate scientific analysis. This dissertation makes the new observation that the anonymizing operations do not need to be made in the original basis of the dataset. Operating in a different, judiciously chosen basis can improve privacy protection, analytic utility, and computational efficiency. I use the term 'spectral anonymization' to refer to anonymizing in a spectral basis, such as the basis provided by the data's eigenvectors. Additionally, I propose new measures of reidentification and prediction risk that are more generally applicable and more informative than existing measures. I also propose a measure of analytic utility that assesses the preservation of the multivariate probability distribution. Finally, I propose the demanding reference standard of nonparticipation in the study to define adequate privacy protection. I give three examples of spectral anonymization in practice. The first example improves basic cell swapping from a weak algorithm to one competitive with state of-the-art methods merely by a change of basis.(cont) The second example demonstrates avoiding the curse of dimensionality in microaggregation. The third describes a powerful algorithm that reduces computational disclosure risk to the same level as that of nonparticipants and preserves at least 4th order interactions in the multivariate distribution. No previously reported algorithm has achieved this combination of results.by Thomas Anton Lasko.Ph.D

DSpace@MIT

Routinely collected data for randomized trials: promises, barriers, and implications

Author: A Sertkaya
AK Meinecke
B Engelen van
B Kasenda
C Bargaje
C Heneghan
C Jacob
C Schmidt
D Strech
Daniel Strech
EI Benchimol
EM Antman
ER Bohm
G Li
Heidi Gardner
I Ford
J Ramsberg
JA Ioannidis
JN Rao
John P. A. Ioannidis
JP Ioannidis
KF Schulz
Kimberly A. Mc Cord
L Emilsson
Lars G. Hemkens
LG Hemkens
LG Hemkens
LG Hemkens
LH Lund
LM Beskow
MGP Zuidgeest
MK Cho
MS Lauer
NK Choudhry
Nuffield Council on Bioethics
O Frobert
R Faden
RE Gliklich
RE Jensen
RK Nayak
RM Califf
Rustam Al-Shahi Salman
Shaun Treweek
SY Kim
T Aung
TA Lasko
V Bhise
William Whiteley
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

This work was supported by Stiftung Institut für klinische Epidemiologie. The Meta-Research Innovation Center at Stanford University is funded by a grant from the Laura and John Arnold Foundation. The funders had no role in design and conduct of the study; the collection, management, analysis, or interpretation of the data; or the preparation, review, or approval of the manuscript or its submission for publication.Peer reviewedPublisher PD

Aberdeen University Research

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

The Use of Routinely Collected Data in Clinical Trial Research

Author: Mc Cord Kimberly Alba
Publication venue
Publication date: 01/01/2020
Field of study

RCTs are the gold standard for assessing the effects of medical interventions, but they also pose many challenges, including the often-high costs in conducting them and a potential lack of generalizability of their findings. The recent increase in the availability of so called routinely collected data (RCD) sources has led to great interest in their application to support RCTs in an effort to increase the efficiency of conducting clinical trials. We define all RCTs augmented by RCD in any form as RCD-RCTs. A major subset of RCD-RCTs are performed at the point of care using electronic health records (EHRs) and are referred to as point-of-care research (POC-R). RCD-RCTs offer several advantages over traditional trials regarding patient recruitment and data collection, and beyond. Using highly standardized EHR and registry data allows to assess patient characteristics for trial eligibility and to examine treatment effects through routinely collected endpoints or by linkage to other data sources like mortality registries. Thus, RCD can be used to augment traditional RCTs by providing a sampling framework for patient recruitment and by directly measuring patient relevant outcomes. The result of these efforts is the generation of real-world evidence (RWE). Nevertheless, the utilization of RCD in clinical research brings novel methodological challenges, and issues related to data quality are frequently discussed, which need to be considered for RCD-RCTs. Some of the limitations surrounding RCD use in RCTs relate to data quality, data availability, ethical and informed consent challenges, and lack of endpoint adjudication which may all lead to uncertainties in the validity of their results. The purpose of this thesis is to help fill the aforementioned research gaps in RCD-RCTs, encompassing tasks such as assessing their current application in clinical research and evaluating the methodological and technical challenges in performing them. Furthermore, it aims to assess the reporting quality of published reports on RCD-RCTs

edoc

Spherical microaggregation : anonymizing sparse vector spaces

Author: Abril Castellano Daniel
Navarro-Arribas Guillermo
Torra i Reventós Vicenç
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Unstructured texts are a very popular data type and still widely unexplored in the privacy preserving data mining field. We consider the problem of providing public information about a set of confidential documents. To that end we have developed a method to protect a Vector Space Model (VSM), to make it public even if the documents it represents are private. This method is inspired by microaggregation, a popular protection method from statistical disclosure control, and adapted to work with sparse and high dimensional data sets

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Diposit Digital de Documents de la UAB

Collaborative Privacy-Preserving Analysis of Oncological Data using Multiparty Homomorphic Encryption

Author: Ahmad Al Badawi
Alexander Gusev
Andreea Alexandru
Barliz Waissengrin
Dan Mirelman
Daniele Micciancio
Deborah T. Blumenthal
Felix Bukstein
Ido Wolf
Lee A. Lavi
Lior Liram
Marcelo Blatt
Nicholas Genise
Oded Rosolio
Ravit Geva
Shafi Goldwasser
Sharon Pelles-Avraham
Tali Schaffer
Vinod Vaikuntanathan
Yuriy Polyakov
Zohar Duchin
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 08/08/2023
Field of study

Real-world healthcare data sharing is instrumental in constructing broader-based and larger clinical data sets that may improve clinical decision-making research and outcomes. Stakeholders are frequently reluctant to share their data without guaranteed patient privacy, proper protection of their data sets, and control over the usage of their data. Fully homomorphic encryption (FHE) is a cryptographic capability that can address these issues by enabling computation on encrypted data without intermediate decryptions, so the analytics results are obtained without revealing the raw data. This work presents a toolset for collaborative privacy-preserving analysis of oncological data using multiparty FHE. Our toolset supports survival analysis, logistic regression training, and several common descriptive statistics. We demonstrate using oncological data sets that the toolset achieves high accuracy and practical performance, which scales well to larger data sets. As part of this work, we propose a novel cryptographic protocol for interactive bootstrapping in multiparty FHE, which is of independent interest. The toolset we develop is general-purpose and can be applied to other collaborative medical and healthcare application domains

Cryptology ePrint Archive

Spectral Anonymization of Data

Author: Staal A. Vinterbo
Thomas A. Lasko
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref