Search CORE

50 research outputs found

Assessing functional annotation transfers with inter-species conserved coexpression: application to Plasmodium falciparum

Author: Bréhélin Laurent
Florent Isabelle
Gascuel Olivier
Maréchal Éric
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background <it>Plasmodium falciparum </it>is the main causative agent of malaria. Of the 5 484 predicted genes of <it>P. falciparum</it>, about 57% do not have sufficient sequence similarity to characterized genes in other species to warrant functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes. Gene expression data have been widely used in the recent years to help functional annotation in an intra-species way via the so-called <it>Guilt By Association </it>(GBA) principle. Results We propose a new method that uses gene expression data to assess inter-species annotation transfers. Our approach starts from a set of likely orthologs between a reference species (here <it>S. cerevisiae </it>and <it>D. melanogaster</it>) and a query species (<it>P. falciparum</it>). It aims at identifying clusters of coexpressed genes in the query species whose coexpression has been conserved in the reference species. These conserved clusters of coexpressed genes are then used to assess annotation transfers between genes with low sequence similarity, enabling reliable transfers of annotations from the reference to the query species. The approach was used with transcriptomic data sets of <it>P. falciparum</it>, <it>S. cerevisiae </it>and <it>D. melanogaster</it>, and enabled us to propose with high confidence new/refined annotations for several dozens hypothetical/putative <it>P. falciparum </it>genes. Notably, we revised the annotation of genes involved in ribosomal proteins and ribosome biogenesis and assembly, thus highlighting several potential drug targets. Conclusions Our approach uses both sequence similarity and gene expression data to help inter-species gene annotation transfers. Experiments show that this strategy improves the accuracy achieved when using solely sequence similarity and outperforms the accuracy of the GBA approach. In addition, our experiments with <it>P. falciparum </it>show that it can infer a function for numerous hypothetical genes.</p

Hal - Université Grenoble Alpes

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL-CEA

ProdInra

Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity

Author: Boureux Anthony
Bréhélin Laurent
Commes Thérèse
Philippe Nicolas
Rivals Éric
Tarhio Jorma
Publication venue: Oxford University Press
Publication date: 16/06/2009
Field of study

Ultra high-throughput sequencing is used to analyse the transcriptome or interactome at unprecedented depth on a genome-wide scale. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as background distribution, sequence errors, and read length impact on the prediction capacity of sequence census experiments. Here we suggest a computational approach to measure these factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. For instance, by analysing chromatin immunoprecipitation read sets, we estimate that 4.6% of reads are affected by SNPs. We show that, although the nucleotide error probability is low, it significantly increases with the position in the sequence. Choosing a read length above 19 bp practically eliminates the risk of finding irrelevant positions, while above 20 bp the number of uniquely mapped reads decreases. With our procedure, we obtain 0.6% false positives among genomic locations. Hence, even rare signatures should identify biologically relevant regions, if they are mapped on the genome. This indicates that digital transcriptomics may help to characterize the wealth of yet undiscovered, low-abundance transcripts

PubMed Central

HAL Descartes

A Plasmodium falciparum FcB1-schizont-EST collection providing clues to schizont specific gene structure and polymorphism

Author: Artiguenave François
Bréhélin Laurent
Charneau Sébastien
Da Silva Corinne
Florent Isabelle
Gascuel Olivier
Grellier Philippe
Guillaume Elodie
Maréchal Eric
Porcel Betina M
Wincker Patrick
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The <it>Plasmodium falciparum </it>genome (3D7 strain) published in 2002, revealed ~5,400 genes, mostly based on <it>in silico </it>predictions. Experimental data is therefore required for structural and functional assessments of <it>P. falciparum </it>genes and expression, and polymorphic data are further necessary to exploit genomic information to further qualify therapeutic target candidates. Here, we undertook a large scale analysis of a <it>P. falciparum </it>FcB1-schizont-EST library previously constructed by suppression subtractive hybridization (SSH) to study genes expressed during merozoite morphogenesis, with the aim of: 1) obtaining an exhaustive collection of schizont specific ESTs, 2) experimentally validating or correcting <it>P. falciparum </it>gene models and 3) pinpointing genes displaying protein polymorphism between the FcB1 and 3D7 strains. Results A total of 22,125 clones randomly picked from the SSH library were sequenced, yielding 21,805 usable ESTs that were then clustered on the <it>P. falciparum </it>genome. This allowed identification of 243 protein coding genes, including 121 previously annotated as hypothetical. Statistical analysis of GO terms, when available, indicated significant enrichment in genes involved in "entry into host-cells" and "actin cytoskeleton". Although most ESTs do not span full-length gene reading frames, detailed sequence comparison of FcB1-ESTs versus 3D7 genomic sequences allowed the confirmation of exon/intron boundaries in 29 genes, the detection of new boundaries in 14 genes and identification of protein polymorphism for 21 genes. In addition, a large number of non-protein coding ESTs were identified, mainly matching with the two A-type rRNA units (on chromosomes 5 and 7) and to a lower extent, two atypical rRNA loci (on chromosomes 1 and 8), TARE subtelomeric regions (several chromosomes) and the recently described telomerase RNA gene (chromosome 9). Conclusion This FcB1-schizont-EST analysis confirmed the actual expression of 243 protein coding genes, allowing the correction of structural annotations for a quarter of these sequences. In addition, this analysis demonstrated the actual transcription of several remarkable non-protein coding loci: 2 atypical rRNA, TARE region and telomerase RNA gene. Together with other collections of <it>P. falciparum </it>ESTs, usually generated from mixed parasite stages, this collection of FcB1-schizont-ESTs provides valuable data to gain further insight into the <it>P. falciparum </it>gene structure, polymorphism and expression.</p

HAL Evry

Crossref

Hal - Université Grenoble Alpes

Springer - Publisher Connector

Directory of Open Access Journals

Author Correction: Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

Author: Bessière Chloé
Bréhélin Laurent
Carninci Piero
Chatelain Clément
de Hoon Michiel J. L.
Fantom consortium
Frith Martin C.
Grapotte Mathys
Hasegawa Akira
Hayashizaki Yoshihide
Itoh Masayoshi
Kasukawa Takeya
Kojima-Ishiyama Miki
Lecellier Charles-Henri
Menichelli Christophe
Murata Mitsuyoshi
Nishiyori-Sueki Hiromi
Noguchi Shuhei
Noma Shohei
Ramilowski Jordan A.
Saraswat Manu
Severin Jessica
Suzuki Harukazu
Tagami Michihira
Wasserman Wyeth W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

edoc

PubMed Central

PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data

Author: A Gasch
A Mateos
A Vazquez
C Brun
D LaCount
D Lockhart
E Dahl
E Marcotte
E Pizzi
E Sonnhammer
G Yona
J Dougherty
J Sachs
J Shock
J Young
Jean-François Dufayard
K Le Roch
K Le Roch
L Dice
L Florens
L Wu
Laurent Bréhélin
M Gardner
M Llinas
MB Eisen
MPS Brown
MR Chmielewski
O Bastion
Olivier Gascuel
P Langley
P Toronen
PT Spellman
S Altschul
T Hastie
Y Chen
Y Zhou
Y Zhou
Z Bozdech
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Of the 5 484 predicted proteins of <it>Plasmodium falciparum</it>, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes. Results We present PlasmoDraft <url>http://atgc.lirmm.fr/PlasmoDraft/</url>, a database of Gene Ontology (GO) annotation predictions for <it>P. falciparum </it>genes based on postgenomic data. Predictions of PlasmoDraft are achieved with a <it>Guilt By Association </it>method named Gonna. This involves (1) a predictor that proposes GO annotations for a gene based on the similarity of its profile (measured with transcriptome, proteome or interactome data) with genes already annotated by GeneDB; (2) a procedure that estimates the confidence of the predictions achieved with each data source; (3) a procedure that combines all data sources to provide a global summary and confidence estimate of the predictions. Gonna has been applied to all <it>P. falciparum </it>genes using most publicly available transcriptome, proteome and interactome data sources. Gonna provides predictions for numerous genes without any annotations. For example, 2 434 genes without any annotations in the Biological Process ontology are associated with specific GO terms (<it>e.g</it>. Rosetting, Antigenic variation), and among these, 841 have confidence values above 50%. In the Cellular Component and Molecular Function ontologies, 1 905 and 1 540 uncharacterized genes are associated with specific GO terms, respectively (740 and 329 with confidence value above 50%). Conclusion All predictions along with their confidence values have been compiled in PlasmoDraft, which thus provides an extensive database of GO annotation predictions that can be achieved with these data sources. The database can be accessed in different ways. A global view allows for a quick inspection of the GO terms that are predicted with high confidence, depending on the various data sources. A gene view and a GO term view allow for the search of potential GO terms attached to a given gene, and genes that potentially belong to a given GO term.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

Author: Abugessaisa Imad
Aitken Stuart
Aken Bronwen L.
Alam Intikhab
Alam Tanvir
Alasiri Rami
Alhendi Ahmad M. N.
Alinejad-Rokny Hamid
Alvarez Mariano J.
Andersson Robin
Arakawa Takahiro
Araki Marito
Arbel Taly
Archer John
Archibald Alan L.
Arner Erik
Arner Peter
Asai Kiyoshi
Ashoor Haitham
Astrom Gaby
Babina M.
Baillie J.K.
Bajic V.B.
Bajpai A.
Baker S.
Baldarelli R.M.
Balic A.
Bansal M.
Batagov A.O.
Batzoglou S.
Beckhouse A.G.
Beltrami A.P.
Beltrami C.A.
Bertin Nicolas
Bessière Chloé
Bhattacharya S.
Bickel P.J.
Blake J.A.
Blanchette M.
Bodega B.
Bonetti A.
Bono H.
Bornholdt J.
Bougouffa S.
Boyd M.
Breda J.
Brombacher F.
Brown J.B.
Bréhélin L.
Bttcher M.
Bult C.J.
Burroughs A.M.
Burt D.W.
Busch A.
Caglio G.
Califano A.
Cameron C.J.
Cannistraci C.V.
Carbone A.
Carlisle A.J.
Carninci Piero
Carninci Piero
Carter K.W.
Cesselli D.
Chang J.-C.
Chatelain Clement
Chen J.C.
Chen Y.
Chierici M.
Christodoulou J.
Ciani Y.
Clark E.L.
Coskun M.
Dalby M.
Dalla E.
Daub C.O.
Davis C.A.
de Hoom Michiel J. L.
de Hoom Michiel J. L.
de Rie D.
Denisenko E.
Deplancke B.
Detmar M.
Deviatiiarov R.
Di Bernardo D.
Diehl A.D.
Dieterich L.C.
Dimont E.
Djebali S.
Dohi T.
Dostie J.
Drablos F.
Edge A.S.B.
Edinger M.
Ehrlund A.
Ekwall K.
Elofsson A.
Endoh M.
Enomoto H.
Enomoto S.
Faghihi M.
Fagiolini M.
FANTOM consortium.
Farach-Carson M.C.
Faulkner G.J.
Favorov A.
Fernandes A.M.
Ferrai C.
Forrest A.R.R.
Forrester L.M.
Forsberg M.
Fort A.
Francescatto M.
Freeman T.C.
Frith Martin C.
Frith Martin C.
Fukuda S.
Funayama M.
Furlanello C.
Furuno M.
Furusawa C.
Gao H.
Gazova I.
Gebhard C.
Geier F.
Geijtenbeek T.B.H.
Ghosh S.
Ghosheh Y.
Gingeras T.R.
Gojobori T.
Goldberg T.
Goldowitz D.
Gough J.
Grapotte Mathys
Greco D.
Gruber A.J.
Guhl S.
Guigo R.
Guler R.
Gusev O.
Gustincich S.
Ha T.J.
Haberle V.
Hale P.
Hallstrom B.M.
Hamada M.
Handoko L.
Hara M.
Harbers M.
Harrow J.
Harshbarger J.
Hase T.
Hasegawa Akira
Hasegawa Akira
Hashimoto K.
Hatano T.
Hattori N.
Hayashi R.
Hayashizaki Yoshihide
Hayashizaki Yoshihide
Herlyn M.
Hettne K.
Heutink P.
Hide W.
Hitchens K.J.
Hon C.C.
Hori F.
Horie M.
Horimoto K.
Horton P.
Hou R.
Huang E.
Huang Y.
Hugues R.
Hume D.
Ienasescu H.
Iida K.
Ikawa T.
Ikemura T.
Ikeo K.
Inoue N.
Ishizu Y.
Ito Y.
Itoh Masayoshi
Itoh Masayoshi
Ivshina A.V.
Jankovic B.R.
Jenjaroenpun P.
Johnson R.
Jorgensen M.
Jorjani H.
Joshi A.
Jurman G.
Kaczkowski B.
Kai C.
Kaida K.
Kajiyama K.
Kaliyaperumal R.
Kaminuma E.
Kanaya T.
Kaneda H.
Kapranov P.
Kasianov A.S.
Kasukawa Takeya
Kasukawa Takeya
Katayama T.
Kato S.
Kawaguchi S.
Kawai J.
Kawaji H.
Kawamoto H.
Kawamura Y.I.
Kawasaki S.
Kawashima T.
Kempfle J.S.
Kenna T.J.
Kere J.
Khachigian L.
Kiryu H.
Kishima M.
Kitajima H.
Kitamura T.
Kitano H.
Klaric E.
Klepper K.
Klinken S.P.
Kloppmann E.
Knox A.J.
Kodama Y.
Kogo Y.
Kojima M.
Kojima S.
Kojima-Ishiyama Miki
Komatsu N.
Komiyama H.
Kono T.
Koseki H.
Koyasu S.
Kratz A.
Kukalev A.
Kulakovskiy I.
Kundaje A.
Kunikata H.
Kuo R.
Kuo T.
Kuraku S.
Kuznetsov V.A.
Kwon T.J.
Larouche M.
Lassmann T.
Laurent G.S.
Law A.
Le-Cao K.-A.
Lecellier C.-H.
Lecellier C.-H.
Lee W.
Lenhard B.
Lennartsson A.
Li K.
Li R.
Lilje B.
Lipovich L.
Lizio M.
Lopez G.
Magi S.
Mak G.K.
Makeev V.
Manabe R.
Mandai M.
Mar J.
Maruyama K.
Maruyama T.
Mason E.
Mathelier A.
Matsuda H.
Medvedeva Y.A.
Meehan T.F.
Mejhert N.
Menichelli Christophe
Meynert A.
Mikami N.
Minoda A.
Miura H.
Miyagi Y.
Miyawaki A.
Mizuno Y.
Morikawa H.
Morimoto M.
Morioka M.
Morishita S.
Moro K.
Motakis E.
Motohashi H.
Mukarram A.K.
Mummery C.L.
Mungall C.J.
Murakawa Y.
Muramatsu M.
Murata Mitsuyoshi
Murata Mitsuyoshi
Nagasaka K.
Nagase T.
Nakachi Y.
Nakahara F.
Nakai K.
Nakamura K.
Nakamura Y.
Nakamura Y.
Nakazawa T.
Nason G.P.
Nepal C.
Nguyen Q.H.
Nielsen L.K.
Nishida K.
Nishiguchi K.M.
Nishiyori H.
Nishiyori-Sueki Hiromi
Nitta K.
Noguchi Shuhei
Noguchi Shuhei
Noma Shohei
Noma Shohei
Notredame C.
Ogishima S.
Ohkura N.
Ohno H.
Ohshima M.
Ohtsu T.
Okada Y.
Okada-Hatakeyama M.
Okazaki Y.
Oksvold P.
Orlando V.
Ow G.S.
Ozturk M.
Pachkov M.
Paparountas T.
Parihar S.P.
Park S.-J.
Pascarella G.
Passier R.
Persson H.
Philippens I.H.
Piazza S.
Plessy C.
Pombo A.
Ponten F.
Poulain S.
Poulsen T.M.
Pradhan S.
Prezioso C.
Pridans C.
Qin X.-Y.
Quackenbush J.
Rackham O.
Ramilowski Jordan A.
Ramilowski Jordan A.
Ravasi T.
Rehli M.
Rennie S.
Rito T.
Rizzu P.
Robert C.
Roos M.
Rost B.
Roudnicky F.
Roy R.
Rye M.B.
Sachenkova O.
Saetrom P.
Sai H.
Saiki S.
Saito A.
Saito M.
Sakaguchi S.
Sakai M.
Sakaue S.
Sakaue-Sawano A.
Sandelin A.
Sano H.
Saraswat Manu
Sasamoto Y.
Sato H.
Saxena A.
Saya H.
Schafferhans A.
Schmeier S.
Schmidl C.
Schmocker D.
Schneider C.
Schueler M.
Schultes E.A.
Schulze-Tanzil G.
Semple C.A.
Seno S.
Seo W.
Sese J.
Severin Jessica
Severin Jessica
Sheng G.
Shi J.
Shimoni Y.
Shin J.W.
SimonSanchez J.
Sivertsson A.
Sjostedt E.
Soderhall C.
Stoiber M.H.
Sugiyama D.
Sui S.H.
Summers K.M.
Suzuki A.M.
Suzuki Harukazu
Suzuki Harukazu
Suzuki K.
Suzuki M.
Suzuki N.
Suzuki T.
Swanson D.J.
Swoboda R.K.
Tagami Michihira
Tagami Michihira
Taguchi A.
Takahashi H.
Takahashi M.
Takamochi K.
Takeda S.
Takenaka Y.
Tam K.T.
Tanaka H.
Tanaka R.
Tanaka Y.
Tang D.
Taniuchi I.
Tanzer A.
Tarui H.
Taylor M.S.
Terada A.
Terao Y.
Testa A.C.
Thomas M.
Thongjuea S.
Tomii K.
Toyoda H.
Triglia E.T.
Tsang H.G.
Tsujikawa M.
Uhlén M.
Valen E.
van de Wetering M.
van Nimwegen E.
Velmeshev D.
Verardo R.
Vitezic M.
Vitting-Seerup K.
von Feilitzen K.
Voolstra C.R.
Vorontsov I.E.
Wahlestedt C.
Wasserman Wyeth W.
Wasserman Wyeth W.
Watanabe K.
Watanabe S.
Wells C.A.
Winteringham L.N.
Wolvetang E.
Yabukami H.
Yagi K.
Yamada T.
Yamaguchi Y.
Yamamoto M.
Yamamoto Y.
Yamamoto Y.
Yamanaka Y.
Yano K.
Yasuzawa K.
Yatsuka Y.
Yo M.
Yokokura S.
Yoneda M.
Yoshida E.
Yoshida Y.
Yoshihara M.
Young R.
Young R.S.
Yu N.Y.
Yumoto N.
Zabierowski S.E.
Zhang P.G.
Zucchelli S.
Zwahlen M.
’t Hoen P.A.C.
Publication venue: Nature Publishing Group
Publication date: 15/12/2020
Field of study

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

A Bayesian Approach for the Clustering of Short Time Series

Author: Laurent Bréhélin
Publication venue: 'Lavoisier'
Publication date
Field of study

Crossref