26 research outputs found
Synthetic data in medical research
Introduction
Demand to access high quality data at the individual level for medical and healthcare research is growing. Electronic health record data collected on whole populations can help to generate real world evidence and can be used for a range of secondary purposes, including testing new hypotheses and developing and evaluating different methodological and statistical approaches. Secondary analysis of primary research data, such as from clinical trials,1 is also valuable—for example, to conduct meta-analyses of individual participant data. However, several complex privacy requirements make accessing these data challenging.2
Information contained in electronic health records or in clinical trial data are highly sensitive and access to these datasets can be an expensive and lengthy process.3 Data privacy and protection regulations are the main barriers to accessing these data for healthcare and medical research.4 Anonymisation (where potentially identifiable variables are removed) is one way to make data available; however, intensive anonymisation can degrade the data to the extent that it is no longer fit for purpose.5 For example, adding random noise to the data reduces precision and leads to larger confidence intervals. Several reidentification attempts on anonymised data have been successful and have harmed public and regulators’ trust in such methods.6 7 For instance, one study showed that patients could be identified by matching information from patient level data that was publicly available, attributing information obtained from newspapers, and contacting those patients directly.6 Use of information from clinical trials and electronic health records of large populations has the potential to benefit medical and healthcare research and makes seeking new approaches to data access imperative. One solution is to use so-called synthetic data, or artificial data, which provide a realistic representation of the original data source. Synthetic data look like the original data source, without containing any information on any real individuals. Synthetic data can attempt to preserve some of the statistical properties of the original data source (eg, distributions of continuous data, proportions of categorical data, correlations between variables, and other model parameters)
Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier
As universities recognize the inherent value in the data they collect and
hold, they encounter unforeseen challenges in stewarding those data in ways
that balance accountability, transparency, and protection of privacy, academic
freedom, and intellectual property. Two parallel developments in academic data
collection are converging: (1) open access requirements, whereby researchers
must provide access to their data as a condition of obtaining grant funding or
publishing results in journals; and (2) the vast accumulation of 'grey data'
about individuals in their daily activities of research, teaching, learning,
services, and administration. The boundaries between research and grey data are
blurring, making it more difficult to assess the risks and responsibilities
associated with any data collection. Many sets of data, both research and grey,
fall outside privacy regulations such as HIPAA, FERPA, and PII. Universities
are exploiting these data for research, learning analytics, faculty evaluation,
strategic decisions, and other sensitive matters. Commercial entities are
besieging universities with requests for access to data or for partnerships to
mine them. The privacy frontier facing research universities spans open access
practices, uses and misuses of data, public records requests, cyber risk, and
curating data for privacy protection. This paper explores the competing values
inherent in data stewardship and makes recommendations for practice, drawing on
the pioneering work of the University of California in privacy and information
security, data governance, and cyber risk.Comment: Final published version, Sept 30, 201