64 research outputs found

    US Data Access and the Commission for Evidence-based Policymaking

    Get PDF
    Introduction In September 2017, the bipartisan Commission for Evidence-based Policymaking released twenty-two recommendations to improve secure data access for evidence building activities involving population-level government files. Many of the files are siloed in government agencies. The commission deliberated over eighteen months to understand the risks and barriers to broader data use. Objectives and Approach I will describe the Commission’s charge and review its recommendations, in context of US laws and privacy debates. I will compare the report’s recommendations and implications to laws and initiatives in other countries. The report calls for the establishment of a National Secure Data Service (NSDS), which has the potential to transform the data sharing environment for federal agencies, policy makers, and researchers. The report suggests more extensive use of differential privacy and secure multiparty computation to protect privacy. I will describe how the current environment could change depending on how the recommendations are implemented. Results The Commission was established under one administration, but the recommendations were released under another. Despite the political and budget uncertainty in Washington, a bill was introduced and passed in the House in November 2017 to implement some recommendations. I will summarize the actions to be taken if the bill becomes law, including directives on learning agendas to prioritize and coordinate evidence-building activities across government, the roles of chief evaluation and chief data officers, and formation of an advisory committee to plan a NSDS. I will describe benefits that could follow from directives in the bill, including transparency about uses of administrative data, development of guidance to assess the risk when combining data sources, and minimization of the risk of publicly releasing de-identified data. Conclusion/Implications The US may develop a national secure data service to support evaluations and policymaking. The recommendations are akin to the UK Data Service. Some recommendations are straightforward, others need years of planning and technical breakthroughs, and all require political buy-in and funding

    The Differential Privacy Corner: What has the US Backed Itself Into?

    Get PDF
    An expanding body of data privacy research reveals that computational advances and ever-growing amounts of publicly retrievable data increase re-identification risks. Because of this, data publishers are realizing that traditional statistical disclosure limitation methods may not protect privacy. This paper discusses the use of differential privacy at the US Census Bureau to protect the published results of the 2020 census. We first discuss the legal framework under which the Census Bureau intends to use differential privacy. The Census Act in the US states that the agency must keep information confidential, avoiding “any publication whereby the data furnished by any particular establishment or individual under this title can be identified.” The fact that Census may release fewer statistics in 2020 than in 2010 is leading scholars to parse the meaning of identification and reevaluate the agency’s responsibility to balance data utility with privacy protection. We then describe technical aspects of the application of differential privacy in the U.S. Census. This data collection is enormously complex and serves a wide variety of users and uses -- 7.8 billion statistics were released using the 2010 US Census. This complexity strains the application of differential privacy to ensure appropriate geographic relationships, respect legal requirements for certain statistics to be free of noise infusion, and provide information for detailed demographic groups. We end by discussing the prospects of applying formal mathematical privacy to other information products at the Census Bureau. At present, techniques exist for applying differential privacy to descriptive statistics, histograms, and counts, but are less developed for more complex data releases including panel data, linked data, and vast person-level datasets. We expect the continued development of formally private methods to occur alongside discussions of what privacy means and the policy issues involved in trading off protection for accuracy

    Genomic and proteomic profiling of responses to toxic metals in human lung cells.

    Get PDF
    Examining global effects of toxic metals on gene expression can be useful for elucidating patterns of biological response, discovering underlying mechanisms of toxicity, and identifying candidate metal-specific genetic markers of exposure and response. Using a 1,200 gene nylon array, we examined changes in gene expression following low-dose, acute exposures of cadmium, chromium, arsenic, nickel, or mitomycin C (MMC) in BEAS-2B human bronchial epithelial cells. Total RNA was isolated from cells exposed to 3 M Cd(II) (as cadmium chloride), 10 M Cr(VI) (as sodium dichromate), 3 g/cm2 Ni(II) (as nickel subsulfide), 5 M or 50 M As(III) (as sodium arsenite), or 1 M MMC for 4 hr. Expression changes were verified at the protein level for several genes. Only a small subset of genes was differentially expressed in response to each agent: Cd, Cr, Ni, As (5 M), As (50 M), and MMC each differentially altered the expression of 25, 44, 31, 110, 65, and 16 individual genes, respectively. Few genes were commonly expressed among the various treatments. Only one gene was altered in response to all four metals (hsp90), and no gene overlapped among all five treatments. We also compared low-dose (5 M, noncytotoxic) and high-dose (50 M, cytotoxic) arsenic treatments, which surprisingly, affected expression of almost completely nonoverlapping subsets of genes, suggesting a threshold switch from a survival-based biological response at low doses to a death response at high doses

    Reconciling Parent-Child Relationships across US Administrative Datasets

    Get PDF
    Introduction Population data capture children, parents, relatives, and others moving in and out of households. The U.S. has seen falling marriage rates, and increases in multigenerational households and complex families, young children living with grandparents, and adult children living with parents. Robust parent-child linkages are critical to understand these demographic shifts. Objectives and Approach We construct and validate parent-child linkages over a century to observe how U.S. households are changing over time. The three largest person-based datafiles in the U.S. are the decennial censuses, the Social Security Administration transaction file, and individual tax returns from the Internal Revenue Service. These sources operationalize relationships differently, capture data at various frequencies, and gather the data for unique purposes. We use probabilistic matching to observe and reconcile parent-child relationships across these sources. The data include a variety of personal identifiers including name, date of birth, parents’ names, address, and place of birth that support matching and validation. Results We find that understanding the content, consistency, and coverage of the files before matching is critical for high quality linkages. The representativeness of the parent-child relationship file improves over time, with the weakest coverage for the Greatest Generation and the strongest coverage for Millennials. Coverage varies by source: tax data underrepresent non-white children and have duplicate records for SSNs, while names and dates of birth are missing from Census data. Multiple match rates differ among demographic groups and over time. In the matching process, the blocking variables rely on common variables across the population datasets. Our approach provides robust entity resolution for women, despite married-maiden name changes. We describe challenges due to data problems in old census records and validation changes in social security data. Conclusion/Implications We conduct a successful reconciliation of parent-child relationships in U.S. population level files. The project supports operational and research uses, such as the 2020 Census. We will extend this work using graph matching and will expand the method to validate other relationship links including spouses and siblings

    The Building Blocks of Interoperability. A Multisite Analysis of Patient Demographic Attributes Available for Matching.

    Get PDF
    BackgroundPatient matching is a key barrier to achieving interoperability. Patient demographic elements must be consistently collected over time and region to be valuable elements for patient matching.ObjectivesWe sought to determine what patient demographic attributes are collected at multiple institutions in the United States and see how their availability changes over time and across clinical sites.MethodsWe compiled a list of 36 demographic elements that stakeholders previously identified as essential patient demographic attributes that should be collected for the purpose of linking patient records. We studied a convenience sample of 9 health care systems from geographically distinct sites around the country. We identified changes in the availability of individual patient demographic attributes over time and across clinical sites.ResultsSeveral attributes were consistently available over the study period (2005-2014) including last name (99.96%), first name (99.95%), date of birth (98.82%), gender/sex (99.73%), postal code (94.71%), and full street address (94.65%). Other attributes changed significantly from 2005-2014: Social security number (SSN) availability declined from 83.3% to 50.44% (p<0.0001). Email address availability increased from 8.94% up to 54% availability (p<0.0001). Work phone number increased from 20.61% to 52.33% (p<0.0001).ConclusionsOverall, first name, last name, date of birth, gender/sex and address were widely collected across institutional sites and over time. Availability of emerging attributes such as email and phone numbers are increasing while SSN use is declining. Understanding the relative availability of patient attributes can inform strategies for optimal matching in healthcare

    Self Curation, Social Partitioning, Escaping from Prejudice and Harassment: the Many Dimensions of Lying Online

    Get PDF
    Portraying matters as other than they truly are is an important part of everyday human communication. In this paper, we use a survey to examine ways in which people fabricate, omit or alter the truth online. Many reasons are found, including creative expression, hiding sensitive information, role-playing, and avoiding harassment or discrimination. The results suggest lying is often used for benign purposes, and we conclude that its use may be essential to maintaining a humane online societ

    Establishing an International Data Linkage Repository Workgroup Toward a Benchmarking Repository

    Get PDF
    Introduction Access to real data with diverse attributes is critical for effective development of any data analytic algorithm. Benchmarking data repositories have all been vital to the development of research communities focused on algorithm development. This work reports on the development of such a data repository for record linkage. Objectives and Approach Establishing a common benchmarking repository of real data can propel a field to the next level of rigor by facilitating comparison of different algorithms, understanding what type of algorithms work best under certain real data conditions and problem domains, promoting transparency and replicability of research, and creating incentives for proper citations for contributions. In addition, benchmarking repositories can bring together the diverse stakeholders (e.g., computer scientists, statisticians, data custodians, data users including social, behaviour, economic, and health (SBEH) scientists) that can advance the field more effectively than could researchers from any single discipline. Results In Fall 2016, international leaders in record linkage formed a Data Linkage Repository workgroup (DLRep) to establish a benchmarking data repository for record linkage. The workgroup is working in collaboration with The Inter-university Consortium for Political and Social Research (ICPSR) to host the site data repository planned for release in Summer 2018. The repository for record linkage research will house various types of real data that require linking with metadata, unique handles for citations, proposed algorithms for evaluation criteria, and a platform for posting, sharing, and comparing results as well as citations of relevant papers. Some datasets will have the gold standard published that researchers can evaluate their results against. Other datasets will gather results to build the gold standard as a community. Conclusion/Implications Record linkage methodology is important to domains where data needs to be integrated from multiple sources, including diverse disciplines. Establishing an international interdisciplinary research community around a benchmark data linkage repository to validate and compare linkage algorithms is crucial to fully realizing the social benefits of data about people

    Seven-step framework to enhance practitioner explanations and parental understandings of research without prior consent in paediatric emergency and critical care trials

    Get PDF
    Background: Alternatives to prospective informed consent enable the conduct of paediatric emergency and critical care trials. Research without prior consent (RWPC) involves practitioners approaching parents after an intervention has been given and seeking consent for their child to continue in the trial. As part of an embedded study in the 'Emergency treatment with Levetiracetam or Phenytoin in Status Epilepticus in children' (EcLiPSE) trial, we explored how practitioners described the trial and RWPC during recruitment discussions, and how well this information was understood by parents. We aimed to develop a framework to assist trial conversations in future paediatric emergency and critical care trials using RWPC. Methods: Qualitative methods embedded within the EcLiPSE trial processes, including audiorecorded practitioner-parent trial discussions and telephone interviews with parents. We analysed data using thematic analysis, drawing on the Realpe et al (2016) model for recruitment to trials. Results: We analysed 76 recorded trial discussions and conducted 30 parent telephone interviews. For 19 parents, we had recorded trial discussion and interview data, which were matched for analysis. Parental understanding of the EcLiPSE trial was enhanced when practitioners: provided a comprehensive description of trial aims; explained the reasons for RWPC; discussed uncertainty about which intervention was best; provided a balanced description of trial intervention; provided a clear explanation about randomisation and provided an opportunity for questions. We present a seven-step framework to assist recruitment practice in trials involving RWPC. Conclusion: This study provides a framework to enhance recruitment practice and parental understanding in paediatric emergency and critical care trials involving RWPC. Further testing of this framework is required

    Establishing a large prospective clinical cohort in people with head and neck cancer as a biomedical resource: head and neck 5000

    Get PDF
    BACKGROUND: Head and neck cancer is an important cause of ill health. Survival appears to be improving but the reasons for this are unclear. They could include evolving aetiology, modifications in care, improvements in treatment or changes in lifestyle behaviour. Observational studies are required to explore survival trends and identify outcome predictors. METHODS: We are identifying people with a new diagnosis of head and neck cancer. We obtain consent that includes agreement to collect longitudinal data, store samples and record linkage. Prior to treatment we give participants three questionnaires on health and lifestyle, quality of life and sexual history. We collect blood and saliva samples, complete a clinical data capture form and request a formalin fixed tissue sample. At four and twelve months we complete further data capture forms and send participants further quality of life questionnaires. DISCUSSION: This large clinical cohort of people with head and neck cancer brings together clinical data, patient-reported outcomes and biological samples in a single co-ordinated resource for translational and prognostic research

    Children must be protected from the tobacco industry's marketing tactics.

    Get PDF
    corecore