26 research outputs found

    Scalable and accurate deep learning for electronic health records

    Get PDF
    Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient's chart.Comment: Published version from https://www.nature.com/articles/s41746-018-0029-

    Bioinformatic Analysis and Post-Translational Modification Crosstalk Prediction of Lysine Acetylation

    Get PDF
    Recent proteomics studies suggest high abundance and a much wider role for lysine acetylation (K-Ac) in cellular functions. Nevertheless, cross influence between K-Ac and other post-translational modifications (PTMs) has not been carefully examined. Here, we used a variety of bioinformatics tools to analyze several available K-Ac datasets. Using gene ontology databases, we demonstrate that K-Ac sites are found in all cellular compartments. KEGG analysis indicates that the K-Ac sites are found on proteins responsible for a diverse and wide array of vital cellular functions. Domain structure prediction shows that K-Ac sites are found throughout a wide variety of protein domains, including those in heat shock proteins and those involved in cell cycle functions and DNA repair. Secondary structure prediction proves that K-Ac sites are preferentially found in ordered structures such as alpha helices and beta sheets. Finally, by mutating K-Ac sites in silico and predicting the effect on nearby phosphorylation sites, we demonstrate that the majority of lysine acetylation sites have the potential to impact protein phosphorylation, methylation, and ubiquitination status. Our work validates earlier smaller-scale studies on the acetylome and demonstrates the importance of PTM crosstalk for regulation of cellular function

    k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction

    Get PDF
    In the clinical application of genomic data analysis and modeling, a number of factors contribute to the performance of disease classification and clinical outcome prediction. This study focuses on the k-nearest neighbor (KNN) modeling strategy and its clinical use. Although KNN is simple and clinically appealing, large performance variations were found among experienced data analysis teams in the MicroArray Quality Control Phase II (MAQC-II) project. For clinical end points and controls from breast cancer, neuroblastoma and multiple myeloma, we systematically generated 463 320 KNN models by varying feature ranking method, number of features, distance metric, number of neighbors, vote weighting and decision threshold. We identified factors that contribute to the MAQC-II project performance variation, and validated a KNN data analysis protocol using a newly generated clinical data set with 478 neuroblastoma patients. We interpreted the biological and practical significance of the derived KNN models, and compared their performance with existing clinical factors

    Association between Acquired Uniparental Disomy and Homozygous Mutations and HER2/ER/PR Status in Breast Cancer

    Get PDF
    Background: Genetic alterations in cellular signaling networks are a hallmark of cancer, however, effective methods to discover them are lacking. A novel form of abnormality called acquired uniparental disomy (aUPD) was recently found to pinpoint the region of mutated genes in various cancers, thereby identifying the region for next-generation sequencing. Methods/Principal Findings: We retrieved large genomic data sets from the Gene Expression Omnibus database to perform genome-wide analysis of aUPD in breast tumor samples and cell lines using approaches that can reliably detect aUPD. Aupd was identified in 52.29% of the tumor samples. The most frequent aUPD regions were located at chromosomes 2q, 3p, 5q, 9p, 9q, 10q, 11q, 13q, 14q and 17q. We evaluated the data for any correlation between the most frequent aUPD regions and HER2/neu, ER, and PR status, and found a statistically significant correlation between the recurrent regions of aUPD and triple negative (TN) breast cancers. aUPD at chromosome 17q (VEZF1, WNT3), 3p (SUMF1, GRM7), 9p (MTAP, NFIB) and 11q (CASP1, CASP4, CASP5) are predictors for TN. The frequency of aUPD was found to be significantly higher in TN breast cancer cases compared to HER2/neu-positive and/or ER or PR-positive cases. Furthermore, using previously published mutation data, we found TP53 homozygously mutated in cell lines having aUPD in that locus. Conclusions/Significance: We conclude that aUPD is a common and non-random molecular feature of breast cancer that is most prominent in triple negative cases. As aUPD regions are different among the main pathological subtypes, specific aUPD regions may aid the sub-classification of breast cancer. In addition, we provide statistical support using TP53 as an example that identifying aUPD regions can be an effective approach in finding aberrant genes. We thus conclu

    Establishing a translational genomics infrastructure in pediatric cancer: the GREAT KIDS experience

    No full text
    We have recently established a biobanking and sequencing pipeline at the University of Chicago dubbed Genomics for Risk Evaluation and Anticancer Therapy in Kids. We plan to intersect family and personal history of cancer and other diseases with multidimensional genomic profiling in order to: understand how genetics may have contributed to the development of cancer for each child, and investigate the spectrum of genomic alterations within a tumor spatially (e.g., primary site vs distant metastasis) and over time (e.g., diagnosis vs relapse). This review highlights some of the practical considerations involved in creating such a program including the capacity to use our platform for multi-institutional collaborations

    Creating a data commons: The INternational Soft Tissue SaRcoma ConsorTium (INSTRuCT)

    No full text
    In this article, we will discuss the genesis, evolution, and progress of the INternational Soft Tissue SaRcoma ConsorTium (INSTRuCT), which aims to foster international research and collaboration focused on pediatric soft tissue sarcoma. We will begin by highlighting the current state of clinical research for pediatric soft tissue sarcomas, including rhabdomyosarcoma and non-rhabdomyosarcoma soft tissue sarcoma. We will then explore challenges and research priorities, describe the development of INSTRuCT, and discuss how the consortium aims to address key research priorities

    Mutations that disrupt PHOXB interaction with the neuronal calcium sensor HPCAL1 impede cellular differentiation in neuroblastoma

    No full text
    Heterozygous germline mutations in PHOX2B, a transcriptional regulator of sympathetic neuronal differentiation, predispose to diseases of the sympathetic nervous system, including neuroblastoma and congenital central hypoventilation syndrome (CCHS). Although the PHOX2B variants in CCHS largely involve expansions of the second polyalanine repeat within the C-terminus of the protein, those associated with neuroblastic tumors are nearly always frameshift and truncation mutations. To test the hypothesis that the neuroblastoma-associated variants exert their effects through loss or gain of protein–protein interactions, we performed a large-scale yeast two-hybrid screen using both wild-type (WT) and six different mutant PHOX2B proteins against over 10 000 human genes. The neuronal calcium sensor protein HPCAL1 (VILIP-3) exhibited strong binding to WT PHOX2B and a CCHS-associated polyalanine expansion mutant but only weakly or not at all to neuroblastoma-associated frameshift and truncation variants. We demonstrate that both WT PHOX2B and the neuroblastoma-associated R100L missense and the CCHS-associated alanine expansion variants induce nuclear translocation of HPCAL1 in a Ca2+-independent manner, while the neuroblastoma-associated 676delG frameshift and K155X truncation mutants impair subcellular localization of HPCAL1, causing it to remain in the cytoplasm. HPCAL1 did not appreciably influence the ability of WT PHOX2B to transactivate the DBH promoter, nor did it alter the decreased transactivation potential of PHOX2B variants in 293T cells. Abrogation of the PHOX2B–HPCAL1 interaction by shRNA knockdown of HPCAL1 in neuroblastoma cells expressing PHOX2B led to impaired neurite outgrowth with transcriptional profiles indicative of inhibited sympathetic neuronal differentiation. Our results suggest that certain PHOX2B variants associated with neuroblastoma pathogenesis, because of their inability to bind to key interacting proteins such as HPCAL1, may predispose to this malignancy by impeding the differentiation of immature sympathetic neurons

    Prognostic significance of pattern and burden of metastatic disease in patients with stage 4 neuroblastoma: A study from the International Neuroblastoma Risk Group database

    No full text
    Neuroblastoma is a childhood cancer with remarkably divergent tumour behaviour and the presence of metastatic disease is a powerful predictor of adverse outcome. However, the importance of the involvement of specific metastatic sites or overall metastatic burden in determining outcome has not been fully explored. We analysed data from the International Neuroblastoma Risk Group database for 2250 patients with stage 4 disease treated from 1990 to 2002. Metastatic burden was assessed using a 'metastatic site index' (MSI), a score based on the number of metastatic systems involved. Overall, involvement of bone marrow, bone, lung, central nervous system, or other sites was associated with worse outcome. For patients aged >= 18 months, involvement of liver had the greatest impact on outcome and was associated with tumour MYCN amplification and adrenal primary and lung metastases. Increased MSI was associated with worse outcome and higher baseline ferritin/lactate dehydrogenase. We explored the impact of initial treatment approach on these associations. Limiting the analysis to patients allocated to protocols including stem cell transplant (SCT), there was no longer an association of outcome with metastatic involvement of any individual system or increasing MSI. Thus, treatment escalation with SCT (and the addition of differentiating agents to maintenance therapy) appears to have provided maximal benefit to patients with greatest metastatic disease burden. These findings underscore the importance of examining prognostic factors in the context of specific treatments since the addition of new therapies may change or even negate the predictive impact of a particular variable. (C) 2016 Elsevier Ltd. All rights reserved
    corecore