Search CORE

43 research outputs found

Securely measuring the overlap between private datasets with cryptosets

Author: Matlock Matthew
Rozenblit Leon
Swamidass S. Joshua
Publication venue: Digital Commons@Becker
Publication date: 01/01/2015
Field of study

Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure

Crossref

Directory of Open Access Journals

Digital Commons@Becker

PubMed Central

Recommended from our members

Translational bioinformatics in mental health: open access data sources and computational biomarker discovery

Author: Bhuvaneshwar Krithika
Fultz Hollis Kate
Gagliardi Jane P
Jia Peilin
Ma Liang
Nagarajan Radhakrishnan
Rakesh Gopalkumar
Rozenblit Leon
Subbian Vignesh
Tenenbaum Jessica D
Visweswaran Shyam
Zhao Zhongming
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/05/2019
Field of study

Mental illness is increasingly recognized as both a significant cost to society and a significant area of opportunity for biological breakthrough. As -omics and imaging technologies enable researchers to probe molecular and physiological underpinnings of multiple diseases, opportunities arise to explore the biological basis for behavioral health and disease. From individual investigators to large international consortia, researchers have generated rich data sets in the area of mental health, including genomic, transcriptomic, metabolomic, proteomic, clinical and imaging resources. General data repositories such as the Gene Expression Omnibus (GEO) and Database of Genotypes and Phenotypes (dbGaP) and mental health (MH)-specific initiatives, such as the Psychiatric Genomics Consortium, MH Research Network and PsychENCODE represent a wealth of information yet to be gleaned. At the same time, novel approaches to integrate and analyze data sets are enabling important discoveries in the area of mental and behavioral health. This review will discuss and catalog into an organizing framework the increasingly diverse set of MH data resources available, using schizophrenia as a focus area, and will describe novel and integrative approaches to molecular biomarker discovery that make use of mental health data.National Institutes of Health [UL1TR001117, R01LM012095, R01LM012806]Open access articleThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

The University of Arizona

Open is as Open Does: Lessons from Running a “Professional Open Source” Company

Author: Julie Hawthorne (426220)
Leon Rozenblit (427382)
Publication venue
Publication date: 01/01/1897
Field of study

In this presentation, Dr. Leon Rozenblit, Founder and CEO of Prometheus Research, describes the lessons learned from running a professional open source company. He covers the business models, core technologies, architectures, and open-source licensing decisions made over the 15 years Prometheus has been in business. Find out more about RexDB at http://www.rexdb.org, or download the source code at http://www.bitbucket.org/rexdb.</p

Biblioteca Virtual del Ministerio de Defensa

FigShare

Developing a Suite of Electronic Data Capture Applications Based on an Open-Source Instrument Definition Standard

Author: Julie Hawthorne (426220)
Leon Rozenblit (427382)
Publication venue
Publication date
Field of study

At the present moment, multiple research groups are configuring either identical or very similar forms for use in different electronic data capture (EDC) systems, resulting in wasted time, lack of consistency across projects and inefficiency. Ideally these various groups should be able to download form configuration tools from an open library of instrument definitions and reuse them in any common EDC application. This problem and vision for the future led us to develop an open-source, portable research instrument standard for mental health (PRISMH). We tested PRISMH with two of our own EDC applications and had positive results. We are now proceeding with developing translators to/from REDCap and to Qualtrics. If you want to develop translators to/from your favorite EDC app, join us! An open-source, revision-controlled instrument-definition library, based on a portable open standard, will save resources and will enable better data sharing and interoperability across research programs and institutions.</p

FigShare

Open is as Open Does: Lessons from Running a “Professional Open Source” Company

Author: Julie Hawthorne (426220)
Leon Rozenblit (427382)
Publication venue
Publication date
Field of study

FigShare

Data management in clinical research: Synthesizing stakeholder perspectives.

Author: Farach Frank J
Johnson Stephen B
Pelphrey Kevin
Rozenblit Leon
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

George Washington University: Health Sciences Research Commons (HSRC)

Improving Research Efficiency, Data Quality, and Data Utility through Integrated Data Management

Author: Frank Farach (425387)
Julie Hawthorne (426220)
Leon Rozenblit (427382)
Publication venue
Publication date
Field of study

Multidimensional data integration and data reuse are emerging challenges in psychological research. We critique several common but inadequate practices and introduce an integrated data management framework, in which data are centralized, cleaned up front, and made available via a query interface for maximum efficiency, quality, and reusability.</p

FigShare

Cryptosets stably estimate the overlap proportion between private datasets, no matter the dataset size, and with accuracy tunable by length.

Author: Leon Rozenblit (427382)
Matthew Matlock (695948)
S. Joshua Swamidass (695947)
Publication venue
Publication date
Field of study

Each column of figures corresponds to a different number of public IDs: 500, 1000 and 2000. The first row shows the results of an empirical study, demonstrating that the error (the spread of each data series) is stable across all dataset sizes. The second row shows the analytically derived 95% confidence intervals (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0117898#pone.0117898.e012" target="_blank">Equation 9</a>), which closely match the distribution of empirical estimates and are stable across all dataset sizes. Also evident in these figures is that estimate accuracy is tuned by the length (the number of possible public IDs) of the cryptosets.</p

FigShare

Cryptosets can measure the overlap between chemical collections.

Author: Leon Rozenblit (427382)
Matthew Matlock (695948)
S. Joshua Swamidass (695947)
Publication venue
Publication date
Field of study

In this figure we compare two molecular libraries, which have about 5000 scaffolds in common. The public IDs from the libraries’ scaffolds are nearly evenly distributed across public IDs, but a subtle, statistically significant correlation demonstrates they overlap. The estimated overlap is quite good. Moreover, the privacy of the libraries is maintained. Within each public ID bin (representative examples shown for one bin), there are both scaffolds unique and common to each library, and there is no way to determine which are which from the cryptosets. Sharing overlaps between molecular libraries could help researchers know when it makes sense to screen a private molecule library with a biological assay.</p

FigShare

Cryptosets are shareable summaries of private data, from which estimates of overlap can be computed.

Author: Leon Rozenblit (427382)
Matthew Matlock (695948)
S. Joshua Swamidass (695947)
Publication venue
Publication date
Field of study

They are constructed using a cryptographic hash function to transform private IDs from a dataset into a limited number of public IDs, and then combining these public IDs into a histogram. From this histogram (about 1000 IDs long in practice), the overlap between private datasets can be estimated in a public space. The security of cryptosets relies on the fact that several private IDs map to each public ID. The estimates are based on the Pearson correlation between cryptosets, and can only measure overlap at a predetermined resolution.</p

FigShare