5 research outputs found

    Systematic identification of abundant A-to-I editing sites in the human transcriptome

    Full text link
    RNA editing by members of the double-stranded RNA-specific ADAR family leads to site-specific conversion of adenosine to inosine (A-to-I) in precursor messenger RNAs. Editing by ADARs is believed to occur in all metazoa, and is essential for mammalian development. Currently, only a limited number of human ADAR substrates are known, while indirect evidence suggests a substantial fraction of all pre-mRNAs being affected. Here we describe a computational search for ADAR editing sites in the human transcriptome, using millions of available expressed sequences. 12,723 A-to-I editing sites were mapped in 1,637 different genes, with an estimated accuracy of 95%, raising the number of known editing sites by two orders of magnitude. We experimentally validated our method by verifying the occurrence of editing in 26 novel substrates. A-to-I editing in humans primarily occurs in non-coding regions of the RNA, typically in Alu repeats. Analysis of the large set of editing sites indicates the role of editing in controlling dsRNA stability.Comment: Pre-print version. See http://dx.doi.org/10.1038/nbt996 for a reprin

    Large-Scale Protein Annotation through Gene Ontology

    No full text
    Recent progress in genomic sequencing, computational biology, and ontology development has presented an opportunity to investigate biological systems from a unique perspective, that is, examining genomes and transcriptomes through the multiple and hierarchical structure of Gene Ontology (GO). We report here our development of GO Engine, a computational platform for GO annotation, and analysis of the resultant GO annotations of human proteins. Protein annotation was centered on sequence homology with GO-annotated proteins and protein domain analysis. Text information analysis and a multiparameter cellular localization predictive tool were also used to increase the annotation accuracy, and to predict novel annotations. The majority of proteins corresponding to full-length mRNA in GenBank, and the majority of proteins in the NR database (nonredundant database of proteins) were annotated with one or more GO nodes in each of the three GO categories. The annotations of GenBank and SWISS-PROT proteins are available to the public at the GO Consortium web site

    Prediction of Influenza Complications: Development and Validation of a Machine Learning Prediction Model to Improve and Expand the Identification of Vaccine-Hesitant Patients at Risk of Severe Influenza Complications

    No full text
    Influenza vaccinations are recommended for high-risk individuals, but few population-based strategies exist to identify individual risks. Patient-level data from unvaccinated individuals, stratified into retrospective cases (n = 111,022) and controls (n = 2,207,714), informed a machine learning model designed to create an influenza risk score; the model was called the Geisinger Flu-Complications Flag (GFlu-CxFlag). The flag was created and validated on a cohort of 604,389 unique individuals. Risk scores were generated for influenza cases; the complication rate for individuals without influenza was estimated to adjust for unrelated complications. Shapley values were used to examine the model’s correctness and demonstrate its dependence on different features. Bias was assessed for race and sex. Inverse propensity weighting was used in the derivation stage to correct for biases. The GFlu-CxFlag model was compared to the pre-existing Medial EarlySign Flu Algomarker and existing risk guidelines that describe high-risk patients who would benefit from influenza vaccination. The GFlu-CxFlag outperformed other traditional risk-based models; the area under curve (AUC) was 0.786 [0.783–0.789], compared with 0.694 [0.690–0.698] (p-value < 0.00001). The presence of acute and chronic respiratory diseases, age, and previous emergency department visits contributed most to the GFlu-CxFlag model’s prediction. When higher numerical scores were assigned to more severe complications, the GFlu-CxFlag AUC increased to 0.828 [0.823–0.833], with excellent discrimination in the final model used to perform the risk stratification of the population. The GFlu-CxFlag can better identify high-risk individuals than existing models based on vaccination guidelines, thus creating a population-based risk stratification for individual risk assessment and deployment in vaccine hesitancy reduction programs in our health system
    corecore