39 research outputs found

    PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Of the 5 484 predicted proteins of <it>Plasmodium falciparum</it>, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes.</p> <p>Results</p> <p>We present PlasmoDraft <url>http://atgc.lirmm.fr/PlasmoDraft/</url>, a database of Gene Ontology (GO) annotation predictions for <it>P. falciparum </it>genes based on postgenomic data. Predictions of PlasmoDraft are achieved with a <it>Guilt By Association </it>method named Gonna. This involves (1) a predictor that proposes GO annotations for a gene based on the similarity of its profile (measured with transcriptome, proteome or interactome data) with genes already annotated by GeneDB; (2) a procedure that estimates the confidence of the predictions achieved with each data source; (3) a procedure that combines all data sources to provide a global summary and confidence estimate of the predictions. Gonna has been applied to all <it>P. falciparum </it>genes using most publicly available transcriptome, proteome and interactome data sources. Gonna provides predictions for numerous genes without any annotations. For example, 2 434 genes without any annotations in the Biological Process ontology are associated with specific GO terms (<it>e.g</it>. Rosetting, Antigenic variation), and among these, 841 have confidence values above 50%. In the Cellular Component and Molecular Function ontologies, 1 905 and 1 540 uncharacterized genes are associated with specific GO terms, respectively (740 and 329 with confidence value above 50%).</p> <p>Conclusion</p> <p>All predictions along with their confidence values have been compiled in PlasmoDraft, which thus provides an extensive database of GO annotation predictions that can be achieved with these data sources. The database can be accessed in different ways. A global view allows for a quick inspection of the GO terms that are predicted with high confidence, depending on the various data sources. A gene view and a GO term view allow for the search of potential GO terms attached to a given gene, and genes that potentially belong to a given GO term.</p

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

    Update on the biochemistry of chlorophyll breakdown

    Full text link
    In land plants, chlorophyll is broken down to colorless linear tetrapyrroles in a highly conserved multi-step pathway. The pathway is termed the 'PAO pathway', because the opening of the chlorine macrocycle present in chlorophyll catalyzed by pheophorbide a oxygenase (PAO), the key enzyme of the pathway, provides the characteristic structural basis found in all further downstream chlorophyll breakdown products. To date, most of the biochemical steps of the PAO pathway have been elucidated and genes encoding many of the chlorophyll catabolic enzymes been identified. This review summarizes the current knowledge on the biochemistry of the PAO pathway and provides insight into recent progress made in the field that indicates that the pathway is more complex than thought in the past
    corecore