40 research outputs found

    Desiderata for the development of next-generation electronic health record phenotype libraries

    Get PDF
    Background High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. Methods A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. Results We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. Conclusions There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains

    Optimising medication data collection in a large-scale clinical trial

    Get PDF
    © 2019 Lockery et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Objective: Pharmaceuticals play an important role in clinical care. However, in community-based research, medication data are commonly collected as unstructured free-text, which is prohibitively expensive to code for large-scale studies. The ASPirin in Reducing Events in the Elderly (ASPREE) study developed a two-pronged framework to collect structured medication data for 19,114 individuals. ASPREE provides an opportunity to determine whether medication data can be cost-effectively collected and coded, en masse from the community using this framework. Methods: The ASPREE framework of type-to-search box with automated coding and linked free text entry was compared to traditional method of free-text only collection and post hoc coding. Reported medications were classified according to their method of collection and analysed by Anatomical Therapeutic Chemical (ATC) group. Relative cost of collecting medications was determined by calculating the time required for database set up and medication coding. Results Overall, 122,910 participant structured medication reports were entered using the type-tosearch box and 5,983 were entered as free-text. Free-text data contributed 211 unique medications not present in the type-to-search box. Spelling errors and unnecessary provision of additional information were among the top reasons why medications were reported as freetext. The cost per medication using the ASPREE method was approximately USD 0.03comparedwithUSD0.03 compared with USD 0.20 per medication for the traditional method. Conclusion Implementation of this two-pronged framework is a cost-effective alternative to free-text only data collection in community-based research. Higher initial set-up costs of this combined method are justified by long term cost effectiveness and the scientific potential for analysis and discovery gained through collection of detailed, structured medication data

    Genetic variation in five genes important in telomere biology and risk for breast cancer

    Get PDF
    Telomeres, consisting of TTAGGG nucleotide repeats and a protein complex at chromosome ends, are critical for maintaining chromosomal stability. Genomic instability, following telomere crisis, may contribute to breast cancer pathogenesis. Many genes critical in telomere biology have limited nucleotide diversity, thus, single nucleotide polymorphisms (SNPs) in this pathway could contribute to breast cancer risk. In a population-based study of 1995 breast cancer cases and 2296 controls from Poland, 24 SNPs representing common variation in POT1, TEP1, TERF1, TERF2 and TERT were genotyped. We did not identify any significant associations between individual SNPs or haplotypes and breast cancer risk; however, data suggested that three correlated SNPs in TERT (−1381C>T, −244C>T, and Ex2-659G>A) may be associated with reduced risk of breast cancer among individuals with a family history of breast cancer (odds ratios 0.73, 0.66, and 0.57, 95% confidence intervals 0.53–1.00, 0.46–0.95 and 0.39–0.84, respectively). In conclusion, our data do not support substantial overall associations between SNPs in telomere pathway genes and breast cancer risk. Intriguing associations with variants in TERT among women with a family history of breast cancer warrant follow-up in independent studies

    Endometrial carcinoma risk among women diagnosed with endometrial hyperplasia: the 34-year experience in a large health plan

    Get PDF
    Classifying endometrial hyperplasia (EH) according to the severity of glandular crowding (simple hyperplasia (SH) vs complex hyperplasia (CH)) and nuclear atypia (simple atypical hyperplasia (SAH) vs complex atypical hyperplasia (CAH)) should predict subsequent endometrial carcinoma risk, but data on progression are lacking. Our nested case–control study of EH progression included 138 cases, who were diagnosed with EH and then with carcinoma (1970–2003) at least 1 year (median, 6.5 years) later, and 241 controls, who were individually matched on age, date, and follow-up duration and counter-matched on EH classification. After centralised pathology panel and medical record review, we generated rate ratios (RRs) and 95% confidence intervals (CIs), adjusted for treatment and repeat biopsies. With disordered proliferative endometrium (DPEM) as the referent, AH significantly increased carcinoma risk (RR=14, 95% CI, 5–38). Risk was highest 1–5 years after AH (RR=48, 95% CI, 8–294), but remained elevated 5 or more years after AH (RR=3.5, 95% CI, 1.0–9.6). Progression risks for SH (RR=2.0, 95% CI, 0.9–4.5) and CH (RR=2.8, 95% CI, 1.0–7.9) were substantially lower and only slightly higher than the progression risk for DPEM. The higher progression risks for AH could foster management guidelines based on markedly different progression risks for atypical vs non-atypical EH

    Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

    Get PDF
    Background Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. Methods We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. Results The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. Conclusions The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians

    Biomedical informatics and translational medicine

    Get PDF
    Biomedical informatics involves a core set of methodologies that can provide a foundation for crossing the "translational barriers" associated with translational medicine. To this end, the fundamental aspects of biomedical informatics (e.g., bioinformatics, imaging informatics, clinical informatics, and public health informatics) may be essential in helping improve the ability to bring basic research findings to the bedside, evaluate the efficacy of interventions across communities, and enable the assessment of the eventual impact of translational medicine innovations on health policies. Here, a brief description is provided for a selection of key biomedical informatics topics (Decision Support, Natural Language Processing, Standards, Information Retrieval, and Electronic Health Records) and their relevance to translational medicine. Based on contributions and advancements in each of these topic areas, the article proposes that biomedical informatics practitioners ("biomedical informaticians") can be essential members of translational medicine teams

    Evolving a national clinical trials learning health system

    No full text
    Abstract Clinical trials generate key evidence to inform decision making, and also benefit participants directly. However, clinical trials frequently fail, often struggle to enroll participants, and are expensive. Part of the problem with trial conduct may be the disconnected nature of clinical trials, preventing rapid data sharing, generation of insights and targeted improvement interventions, and identification of knowledge gaps. In other areas of healthcare, a learning health system (LHS) has been proposed as a model to facilitate continuous learning and improvement. We propose that an LHS approach could greatly benefit clinical trials, allowing for continuous improvements to trial conduct and efficiency. A robust trial data sharing system, continuous analysis of trial enrollment and other success metrics, and development of targeted trial improvement interventions are potentially key components of a Trials LHS reflecting the learning cycle and allowing for continuous trial improvement. Through the development and use of a Trials LHS, clinical trials could be treated as a system, producing benefits to patients, advancing care, and decreasing costs for stakeholders
    corecore