16 research outputs found

    Enhanced protein isoform characterization through long-read proteogenomics

    Get PDF
    [Background] The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms.[Results] We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis.[Conclusions] Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.This work was supported by a National Institutes of Health (NIH) grant R35GM142647 (G.M.S.), NIH grant R35GM126914 (L.M.S.), and Jackson Laboratory (A.D.M.). The codeathon which initiated the project was supported by the NIH STRIDES Initiative at the NIH.Peer reviewe

    A framework for future national pediatric pandemic respiratory disease severity triage: The HHS pediatric COVID-19 data challenge

    Get PDF
    Abstract Introduction: With persistent incidence, incomplete vaccination rates, confounding respiratory illnesses, and few therapeutic interventions available, COVID-19 continues to be a burden on the pediatric population. During a surge, it is difficult for hospitals to direct limited healthcare resources effectively. While the overwhelming majority of pediatric infections are mild, there have been life-threatening exceptions that illuminated the need to proactively identify pediatric patients at risk of severe COVID-19 and other respiratory infectious diseases. However, a nationwide capability for developing validated computational tools to identify pediatric patients at risk using real-world data does not exist. Methods: HHS ASPR BARDA sought, through the power of competition in a challenge, to create computational models to address two clinically important questions using the National COVID Cohort Collaborative: (1) Of pediatric patients who test positive for COVID-19 in an outpatient setting, who are at risk for hospitalization? (2) Of pediatric patients who test positive for COVID-19 and are hospitalized, who are at risk for needing mechanical ventilation or cardiovascular interventions? Results: This challenge was the first, multi-agency, coordinated computational challenge carried out by the federal government as a response to a public health emergency. Fifty-five computational models were evaluated across both tasks and two winners and three honorable mentions were selected. Conclusion: This challenge serves as a framework for how the government, research communities, and large data repositories can be brought together to source solutions when resources are strapped during a pandemic

    Recurrence of early stage colon cancer predicted by expression pattern of circulating microRNAs.

    Get PDF
    Systemic treatment of patients with early-stage cancers attempts to eradicate occult metastatic disease to prevent recurrence and increased morbidity. However, prediction of recurrence from an analysis of the primary tumor is limited because disseminated cancer cells only represent a small subset of the primary lesion. Here we analyze the expression of circulating microRNAs (miRs) in serum obtained pre-surgically from patients with early stage colorectal cancers. Groups of five patients with and without disease recurrence were used to identify an informative panel of circulating miRs using quantitative PCR of genome-wide miR expression as well as a set of published candidate miRs. A panel of six informative miRs (miR-15a, mir-103, miR-148a, miR-320a, miR-451, miR-596) was derived from this analysis and evaluated in a separate validation set of thirty patients. Hierarchical clustering of the expression levels of these six circulating miRs and Kaplan-Meier analysis showed that the risk of disease recurrence of early stage colon cancer can be predicted by this panel of miRs that are measurable in the circulation at the time of diagnosis (P = 0.0026; Hazard Ratio 5.4; 95% CI of 1.9 to 15)

    Single-Molecule Real-Time (SMRT) Full-Length RNA-Sequencing Reveals Novel and Distinct mRNA Isoforms in Human Bone Marrow Cell Subpopulations

    Get PDF
    Hematopoietic cells are continuously replenished from progenitor cells that reside in the bone marrow. To evaluate molecular changes during this process, we analyzed the transcriptomes of freshly harvested human bone marrow progenitor (lineage-negative) and differentiated (lineage-positive) cells by single-molecule real-time (SMRT) full-length RNA-sequencing. This analysis revealed a ~5-fold higher number of transcript isoforms than previously detected and showed a distinct composition of individual transcript isoforms characteristic for bone marrow subpopulations. A detailed analysis of messenger RNA (mRNA) isoforms transcribed from the ANXA1 and EEF1A1 loci confirmed their distinct composition. The expression of proteins predicted from the transcriptome analysis was evaluated by mass spectrometry and validated previously unknown protein isoforms predicted e.g., for EEF1A1. These protein isoforms distinguished the lineage negative cell population from the lineage positive cell population. Finally, transcript isoforms expressed from paralogous gene loci (e.g., CFD, GATA2, HLA-A, B, and C) also distinguished cell subpopulations but were only detectable by full-length RNA sequencing. Thus, qualitatively distinct transcript isoforms from individual genomic loci separate bone marrow cell subpopulations indicating complex transcriptional regulation and protein isoform generation during hematopoiesis

    miR expression levels in serum samples from patients with or without recurrence of early stage colon cancer.

    No full text
    <p>Validation study for six miRs identified in the Pilot study. Patient characteristics are provided in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0084686#pone-0084686-t001" target="_blank">Table 1</a>. Six miRs were derived from the pilot study. <b>A</b>, Expression levels based on the cycle threshold (Ct) values of the qRT-PCR (left axis) and the miR concentration calculated (right axis). <b>B</b>, Principal Component Analysis of the data with the two groups shown in black and red symbols respectively.</p

    Study design (A) and time-to-disease recurrence in early stage colon cancer patients in the pilot study (B).

    No full text
    <p><b>A</b>, From a pilot study with 10 patients candidate miRs predictive of disease recurrence were identified and tested for their prediction of disease recurrence in a validation study. <b>B</b>, Kaplan-Meier plot of disease recurrence in patients in the pilot study. Patients with disease recurrence (n = 5) vs no recurrence (n = 5): Chi square 5.47, p = 0.0193; median time-to-recurrence = 26 months. The Gehan-Breslow-Wilcoxon algorithms were used.</p

    miR expression levels in serum samples from patients with or without recurrence of early stage colon cancer (Pilot Study).

    No full text
    <p>Candidate miR approach. Circulating levels of 16 miRs indicated on the x-axis that had been published as differentially expressed between colon cancer and non–malignant colon tissues were analyzed <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0084686#pone.0084686-Schepeler1" target="_blank">[21]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0084686#pone.0084686-Aslam1" target="_blank">[34]</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0084686#pone.0084686-Cheng1" target="_blank">[35]</a>. Pre-surgery serum samples were from patients in the pilot study. Patients had been followed for disease recurrence and the respective data are in Fig. 1B. <b>A</b>, Concentration of circulating miRs (relative to U6). Note the log-scale that covers a range of 100,000-fold. <b>B</b>, ratio of expression between patient groups. Alhough miR-20, miR-195 and miR-320 showed a ≥2-fold downregulation, and miR-135b and miR-615 a ≥2-fold upregulation in serum from patients with disease recurrence, neither of the comparisons reached statistical significance by ANOVA (p>0.05).</p
    corecore