15 research outputs found

    Implementing Informatics Tools with Data Management Plans for Disease Area Research

    Get PDF
    Data Management Plans (DMPs) are essential to a research data life cycle. The DMPs should be developed as part of the research programs to be effective. For disease area research, integrating research community-recommended data standards during collection can enhance the likelihood of data reuse. Informatics tools are required as part of DMPs with the aim of data being findable, accessible, interoperable, and reusable. The US National Institutes of Health supports various disease area research programs and has recently finalized the Data Management and Sharing Policy. The policy highlights the importance of sharing data and metadata, including information on various elements such as data types, standards, storage repositories, access, services, and tools used for a proposed research project. The present paper provides Traumatic Brain Injury (TBI) and Parkinson’s Disease (PD) research as examples of where the elements of the policy are being supported. The software tools that have been developed for the TBI and PD plans are available through the Biomedical Research Informatics Computing System. A Protocol and Form Research Management System (ProFoRMS) facilitates researchers to manage research protocols when collecting clinical data. The ProFoRMS also supports automatic validation with the data dictionaries for TBI and Parkinson’s disease. Detailed information on the functionality of the software tools used for preserving data within TBI and PD repositories is openly available on their respective websites

    The EPIRARE proposal of a set of indicators and common data elements for the European platform for rare disease registration

    Get PDF
    BACKGROUND: The European Union acknowledges the relevance of registries as key instruments for developing rare disease (RD) clinical research, improving patient care and health service (HS) planning and funded the EPIRARE project to improve standardization and data comparability among patient registries and to support new registries and data collections. METHODS: A reference list of patient registry-based indicators has been prepared building on the work of previous EU projects and on the platform stakeholders' information needs resulting from the EPIRARE surveys and consultations. The variables necessary to compute these indicators have been analysed for their scope and use and then organized in data domains. RESULTS: The reference indicators span from disease surveillance, to socio-economic burden, HS monitoring, research and product development, policy equity and effectiveness. The variables necessary to compute these reference indicators have been selected and, with the exception of more sophisticated indicators for research and clinical care quality, they can be collected as data elements common (CDE) to all rare diseases. They have been organized in data domains characterized by their contents and main goal and a limited set of mandatory data elements has been defined, which allows case notification independently of the physician or the health service. CONCLUSIONS: The definition of a set of CDE for the European platform for RD patient registration is the first step in the promotion of the use of common tools for the collection of comparable data. The proposed organization of the CDE contributes to the completeness of case ascertainment, with the possible involvement of patients and patient associations in the registration process.This work is part of the activities of the project titled “Building Consensus and synergies for the EU Registration of Rare Disease Patients” (EPIRARE), funded by the European Commission within the framework of the Health Project, Work Plan 2010 (Grant n. 20101202).S

    NHash: Randomized N-Gram Hashing for Distributed Generation of Validatable Unique Study Identifiers in Multicenter Research

    Get PDF
    BACKGROUND: A unique study identifier serves as a key for linking research data about a study subject without revealing protected health information in the identifier. While sufficient for single-site and limited-scale studies, the use of common unique study identifiers has several drawbacks for large multicenter studies, where thousands of research participants may be recruited from multiple sites. An important property of study identifiers is error tolerance (or validatable), in that inadvertent editing mistakes during their transmission and use will most likely result in invalid study identifiers. OBJECTIVE: This paper introduces a novel method called Randomized N-gram Hashing (NHash), for generating unique study identifiers in a distributed and validatable fashion, in multicenter research. NHash has a unique set of properties: (1) it is a pseudonym serving the purpose of linking research data about a study participant for research purposes; (2) it can be generated automatically in a completely distributed fashion with virtually no risk for identifier collision; (3) it incorporates a set of cryptographic hash functions based on N-grams, with a combination of additional encryption techniques such as a shift cipher; (d) it is validatable (error tolerant) in the sense that inadvertent edit errors will mostly result in invalid identifiers. METHODS: NHash consists of 2 phases. First, an intermediate string using randomized N-gram hashing is generated. This string consists of a collection of N-gram hashes f1, f2, ..., fk. The input for each function fi has 3 components: a random number r, an integer n, and input data m. The result, fi(r, n, m), is an n-gram of m with a starting position s, which is computed as (r mod |m|), where |m| represents the length of m. The output for Step 1 is the concatenation of the sequence f1(r1, n1, m1), f2(r2, n2, m2), ..., fk(rk, nk, mk). In the second phase, the intermediate string generated in Phase 1 is encrypted using techniques such as shift cipher. The result of the encryption, concatenated with the random number r, is the final NHash study identifier. RESULTS: We performed experiments using a large synthesized dataset comparing NHash with random strings, and demonstrated neglegible probability for collision. We implemented NHash for the Center for SUDEP Research (CSR), a National Institute for Neurological Disorders and Stroke-funded Center Without Walls for Collaborative Research in the Epilepsies. This multicenter collaboration involves 14 institutions across the United States and Europe, bringing together extensive and diverse expertise to understand sudden unexpected death in epilepsy patients (SUDEP). CONCLUSIONS: The CSR Data Repository has successfully used NHash to link deidentified multimodal clinical data collected in participating CSR institutions, meeting all desired objectives of NHash

    Implementing Informatics Tools with Data Management Plans for Disease Area Research

    Get PDF
    Data Management Plans (DMPs) are essential to a research data life cycle. The DMPs should be developed as part of the research programs to be effective. For disease area research, integrating research community-recommended data standards during collection can enhance the likelihood of data reuse. Informatics tools are required as part of DMPs with the aim of data being findable, accessible, interoperable, and reusable. The US National Institutes of Health supports various disease area research programs and has recently finalized the Data Management and Sharing Policy. The policy highlights the importance of sharing data and metadata, including information on various elements such as data types, standards, storage repositories, access, services, and tools used for a proposed research project. The present paper provides Traumatic Brain Injury (TBI) and Parkinson’s Disease (PD) research as examples of where the elements of the policy are being supported. The software tools that have been developed for the TBI and PD plans are available through the Biomedical Research Informatics Computing System. A Protocol and Form Research Management System (ProFoRMS) facilitates researchers to manage research protocols when collecting clinical data. The ProFoRMS also supports automatic validation with the data dictionaries for TBI and Parkinson’s disease. Detailed information on the functionality of the software tools used for preserving data within TBI and PD repositories is openly available on their respective websites

    Circulation

    Get PDF
    The National Heart, Lung, and Blood Institute convened a working group in January 2015 to explore issues related to an integrated data network for congenital heart disease research. The overall goal was to develop a common vision for how the rapidly increasing volumes of data captured across numerous sources can be managed, integrated, and analyzed to improve care and outcomes. This report summarizes the current landscape of congenital heart disease data, data integration methodologies used across other fields, key considerations for data integration models in congenital heart disease, and the short- and long-term vision and recommendations made by the working group.CC999999/Intramural CDC HHS/United StatesU10 HL109737/HL/NHLBI NIH HHS/United States2017-04-05T00:00:00Z27045129PMC49328907184vault:2134

    The autism inpatient collection: Methods and preliminary sample description

    Get PDF
    © 2015 Siegel et al. Background: Individuals severely affected by autism spectrum disorder (ASD), including those with intellectual disability, expressive language impairment, and/or self-injurious behavior (SIB), are underrepresented in the ASD literature and extant collections of phenotypic and biological data. An understanding of ASD's etiology and subtypes can only be as complete as the studied samples are representative. Methods: The Autism Inpatient Collection (AIC) is a multi-site study enrolling children and adolescents with ASD aged 4-20 years admitted to six specialized inpatient psychiatry units. Enrollment began March, 2014, and continues at a rate of over 400 children annually. Measures characterizing adaptive and cognitive functioning, communication, externalizing behaviors, emotion regulation, psychiatric co-morbidity, self-injurious behavior, parent stress, and parent self-efficacy are collected. ASD diagnosis is confirmed by the Autism Diagnostic Observation Schedule - 2 (ADOS-2) and extensive inpatient observation. Biological samples from probands and their biological parents are banked and processed for DNA extraction and creation of lymphoblastoid cell lines. Results: Sixty-one percent of eligible subjects were enrolled. The first 147 subjects were an average of 12.6 years old (SD 3.42, range 4-20); 26.5 % female; 74.8 % Caucasian, and 81.6 % non-Hispanic/non-Latino. Mean non-verbal intelligence quotient IQ = 70.9 (SD 29.16, range 30-137) and mean adaptive behavior composite score = 55.6 (SD 12.9, range 27-96). A majority of subjects (52.4 %) were non- or minimally verbal. The average Aberrant Behavior Checklist - Irritability Subscale score was 28.6, well above the typical threshold for clinically concerning externalizing behaviors, and 26.5 % of the sample engaged in SIB. Females had more frequent and severe SIB than males. Conclusions: Preliminary data indicate that the AIC has a rich representation of the portion of the autism spectrum that is understudied and underrepresented in extant data collections. More than half of the sample is non- or minimally verbal, over 40 % have intellectual disability, and over one quarter exhibit SIB. The AIC is a substantial new resource for study of the full autism spectrum, which will augment existing data on higher-functioning cohorts and facilitate the identification of genetic subtypes and novel treatment targets. The AIC investigators welcome collaborations with other investigators, and access to the AIC phenotypic data and biosamples may be requested through the Simons Foundation (www.sfari.org)

    Cardiac biomarkers in pediatric cardiomyopathy: Study design and recruitment results from the Pediatric Cardiomyopathy Registry

    Get PDF
    Background: Cardiomyopathies are a rare cause of pediatric heart disease, but they are one of the leading causes of heart failure admissions, sudden death, and need for heart transplant in childhood. Reports from the Pediatric Cardiomyopathy Registry (PCMR) have shown that almost 40% of children presenting with symptomatic cardiomyopathy either die or undergo heart transplant within 2 years of presentation. Little is known regarding circulating biomarkers as predictors of outcome in pediatric cardiomyopathy. Study Design: The Cardiac Biomarkers in Pediatric Cardiomyopathy (PCM Biomarkers) study is a multi-center prospective study conducted by the PCMR investigators to identify serum biomarkers for predicting outcome in children with dilated cardiomyopathy (DCM) and hypertrophic cardiomyopathy (HCM). Patients less than 21 years of age with either DCM or HCM were eligible. Those with DCM were enrolled into cohorts based on time from cardiomyopathy diagnosis: categorized as new onset or chronic. Clinical endpoints included sudden death and progressive heart failure. Results: There were 288 children diagnosed at a mean age of 7.2±6.3 years who enrolled in the PCM Biomarkers Study at a median time from diagnosis to enrollment of 1.9 years. There were 80 children enrolled in the new onset DCM cohort, defined as diagnosis at or 12 months prior to enrollment. The median age at diagnosis for the new onset DCM was 1.7 years and median time from diagnosis to enrollment was 0.1 years. There were 141 children enrolled with either chronic DCM or chronic HCM, defined as children ≥2 years from diagnosis to enrollment. Among children with chronic cardiomyopathy, median age at diagnosis was 3.4 years and median time from diagnosis to enrollment was 4.8 years. Conclusion: The PCM Biomarkers study is evaluating the predictive value of serum biomarkers to aid in the prognosis and management of children with DCM and HCM. The results will provide valuable information where data are lacking in children. Clinical Trial Registration: NCT01873976 https://clinicaltrials.gov/ct2/show/NCT01873976?term=PCM+Biomarker&rank=

    J Am Heart Assoc

    Get PDF
    2013682

    Record linkage of population-based cohort data from minors with national register data : A scoping review and comparative legal analysis of four European countries

    Get PDF
    Funding Information: We would like to acknowledge Evert-Ben van Veen from the MLC Foundation, Dagelijkse Groenmarkt 2, 2513 AL Den Haag, the Netherlands. The results on the country-specific text on the Netherlands was based on his contribution. Publisher Copyright: © 2021 Doetsch JN et al.Background: The GDPR was implemented to build an overarching framework for personal data protection across the EU/EEA. Linkage of data directly collected from cohort participants, potentially serving as a prominent tool for health research, must respect data protection rules and privacy rights. Our objective was to investigate law possibilities of linking cohort data of minors with routinely collected education and health data comparing EU/EEA member states. Methods: A legal comparative analysis and scoping review was conducted of openly accessible published laws and regulations in EUR-Lex and national law databases on GDPR's implementation in Portugal, Finland, Norway, and the Netherlands and its connected national regulations purposing record linkage for health research that have been implemented up until April 30, 2021. Results: The GDPR does not ensure total uniformity in data protection legislation across member states offering flexibility for national legislation. Exceptions to process personal data, e.g., public interest and scientific research, must be laid down in EU/EEA or national law. Differences in national interpretation caused obstacles in cross-national research and record linkage: Portugal requires written consent and ethical approval; Finland allows linkage mostly without consent through the national Social and Health Data Permit Authority; Norway when based on regional ethics committee's approval and adequate information technology safeguarding confidentiality; the Netherlands mainly bases linkage on the opt-out system and Data Protection Impact Assessment. Conclusions: Though the GDPR is the most important legal framework, national legislation execution matters most when linking cohort data with routinely collected health and education data. As national interpretation varies, legal intervention balancing individual right to informational self-determination and public good is gravely needed for health research. More harmonization across EU/EEA could be helpful but should not be detrimental in those member states which already opened a leeway for registries and research for the public good without explicit consent.Peer reviewe

    Privacy preserving linkage and sharing of sensitive data

    Get PDF
    2018 Summer.Includes bibliographical references.Sensitive data, such as personal and business information, is collected by many service providers nowadays. This data is considered as a rich source of information for research purposes that could benet individuals, researchers and service providers. However, because of the sensitivity of such data, privacy concerns, legislations, and con ict of interests, data holders are reluctant to share their data with others. Data holders typically lter out or obliterate privacy related sensitive information from their data before sharing it, which limits the utility of this data and aects the accuracy of research. Such practice will protect individuals' privacy; however it prevents researchers from linking records belonging to the same individual across dierent sources. This is commonly referred to as record linkage problem by the healthcare industry. In this dissertation, our main focus is on designing and implementing ecient privacy preserving methods that will encourage sensitive information sources to share their data with researchers without compromising the privacy of the clients or aecting the quality of the research data. The proposed solution should be scalable and ecient for real-world deploy- ments and provide good privacy assurance. While this problem has been investigated before, most of the proposed solutions were either considered as partial solutions, not accurate, or impractical, and therefore subject to further improvements. We have identied several issues and limitations in the state of the art solutions and provided a number of contributions that improve upon existing solutions. Our rst contribution is the design of privacy preserving record linkage protocol using semi-trusted third party. The protocol allows a set of data publishers (data holders) who compete with each other, to share sensitive information with subscribers (researchers) while preserving the privacy of their clients and without sharing encryption keys. Our second contribution is the design and implementation of a probabilistic privacy preserving record linkage protocol, that accommodates discrepancies and errors in the data such as typos. This work builds upon the previous work by linking the records that are similar, where the similarity range is formally dened. Our third contribution is a protocol that performs information integration and sharing without third party services. We use garbled circuits secure computation to design and build a system to perform the record linkages between two parties without sharing their data. Our design uses Bloom lters as inputs to the garbled circuits and performs a probabilistic record linkage using the Dice coecient similarity measure. As garbled circuits are known for their expensive computations, we propose new approaches that reduce the computation overhead needed, to achieve a given level of privacy. We built a scalable record linkage system using garbled circuits, that could be deployed in a distributed computation environment like the cloud, and evaluated its security and performance. One of the performance issues for linking large datasets is the amount of secure computation to compare every pair of records across the linked datasets to nd all possible record matches. To reduce the amount of computations a method, known as blocking, is used to lter out as much as possible of the record pairs that will not match, and limit the comparison to a subset of the record pairs (called can- didate pairs) that possibly match. Most of the current blocking methods either require the parties to share blocking keys (called blocks identiers), extracted from the domain of some record attributes (termed blocking variables), or share reference data points to group their records around these points using some similarity measures. Though these methods reduce the computation substantially, they leak too much information about the records within each block. Toward this end, we proposed a novel privacy preserving approximate blocking scheme that allows parties to generate the list of candidate pairs with high accuracy, while protecting the privacy of the records in each block. Our scheme is congurable such that the level of performance and accuracy could be achieved according to the required level of privacy. We analyzed the accuracy and privacy of our scheme, implemented a prototype of the scheme, and experimentally evaluated its accuracy and performance against dierent levels of privacy
    corecore