34 research outputs found
Table1_AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.XLSX
Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline.</p
Table2_AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.docx
Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline.</p
Classification of ESTs in Gene Ontology category.
<p>The ESTs of <i>L. longipalpis</i> were submitted to a search against the three categories of Gene Ontology (NCBI). The e-value cutoff was 1.0e-5.</p
β-defensin sequence analysis.
<p>(A) Neighbor-joining tree of putative β-defensins: <i>L. longipalpis</i> 1 (BAR005E03/AM091821, male reproductive organs and whole female cDNA libraries), <i>L. longipalpis</i> 2 (EU124626, midgut female library) and <i>L. longipalpis</i> 3 (EX211140, midgut female library), <i>A. aegypti</i> (AEL009861), <i>A. gambiae</i> (AGAP007049), <i>D. melanogaster</i> (CG10433), and <i>B. mori</i> (NP_001106745). Bootstrap percentage values indicated in nodes are based on 1000 replicates. (B) Multiple alignment of putative β-defensin of male reproductive tracts from <i>L. longipalpis</i> and its orthologues in Diptera. Conserved amino acids are indicated by (*).</p
Cyclophilin sequence analysis.
<p>(A) Neighbor-joining tree of putative cyclophilin <i>L. longipalpis</i> (RAAPBAR022E08/AM092289, male reproductive organs and whole female cDNA libraries), <i>A. gambiae</i> (AGAP007088-PA), <i>A. aegypti</i> (AAEL013279), <i>D. melanogaster</i> (FBpp0071844/CG2852) and <i>A. mellifera</i> (NP_001229473). Bootstrap percentage values indicated in nodes are based on 1000 replicates. (B) Multiple alignment of putative cyclophilin of male reproductive tracts from <i>L. longipalpis</i> and its orthologues in Diptera. Conserved amino acids are indicated by (*).</p
Putative <i>L. longipalpis</i> mRGPs.
<p><b>N-</b> Number of reads. <b>DB</b>- Database. PTN- Protein. COEBE4D- carboxylesterase, beta esterase. Crisp- Cysteine-rich secreted proteins.</p>*<p>ESTs that have yielded best matches to mRGPs/Acps from protein databases (three against <i>A. aegypti</i> and two against <i>A. gambiae</i>). AGAP00 Sequences come from AgamP3.6_vectorbase and AAEL0 Sequences come from AaegL1.2_vetorbase.</p
Protease inhibitor sequence analysis.
<p>(A) Neighbor-joining tree of putative protease inhibitor <i>L. longipalpis</i> (RAAPBAR023H02/EW989852 B male reproductive organs and midgut female cDNA library), <i>A. aegypti</i> (AAEL000551), <i>A. gambiae</i> (AGAP011319), and Apis mellifera (XP_003250953). Bootstrap percentage values indicated in nodes are based on 1000 replicates. (B) Multiple alignment of putative protease inhibitor of male reproductive tracts from <i>L. longipalpis</i> and its orthologues in Diptera. Conserved amino acids are indicated by (*).</p
Astacin metalloprotease sequence analysis.
<p>(A) Neighbor-joining tree of putative astacin from <i>L. longipalpis</i> (RAAPBAR022F08 male reproductive organs cDNA libraries), <i>L. longipalpis</i> 2 (AM088883 whole female cDNA libraries) and <i>L. longipalpis</i> 3 (Lulo-Astacin A8CW49_LUTLO, midgut female library) <i>A. aegypti</i> (AAEL013449), <i>A. gambiae</i> (AGAP010764), <i>D. melanogaster</i> (FBpp0080341/CG15254) and <i>Nasonia vitripenis</i> (NV12552). Bootstrap percentage values indicated in nodes are based on 1000 replicates. (B) Multiple alignment of putative astacin of male reproductive tracts from <i>L. longipalpis</i> and its orthologues in Diptera. Conserved amino acids are indicated by (*).</p
Thioredoxin sequence analysis.
<p>(A) Neighbor-joining tree of putative thioredoxin <i>L. longipalpis</i> (RAAPBAR020D12 male reproductive organs cDNA libraries), <i>A. aegypti</i> (AAEL010777), <i>A. gambiae</i> (AGAP009584-PA) and <i>Tribolium castaneum</i> (XM_962894.2). Bootstrap percentage values indicated in nodes are based on 1000 replicates. (B) Multiple alignment of putative thioredoxin of male reproductive tracts from <i>L. longipalpis</i> and its orthologues in Diptera. Conserved amino acids are indicated by (*).</p
ESTs with other specific function.
<p><b>N-</b> Number of reads. OBP-Odorant Binding Protein.</p