Search CORE

35,961 research outputs found

REPARATION : ribosome profiling assisted (re-)annotation of bacterial genomes

Author: Giess Adam
Jonckheere Veronique
Menschaert Gerben
Ndah Elvis
Valen Eivind
Van Damme Petra
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/ REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames

Ghent University Academic Bibliography

The comprehensive microbial resource

Author: Alice
Alice
Altschul
Ansong
Anuradha Ganapathy
Ashburner
Bairoch
Barrett
Beiko
Benson
Chandonia
Chiu
Clarke
Delcher
Delcher
Dethlefsen
Ducey
Durot
Erin Beck
Finn
Gibbons
Granger Sutton
Griffiths-Jones
Haft
Haft
Hulo
Humbert
Johnson
Kanehisa
Karp
Kersey
Kevin Galinsky
Klimke
Lone
Lowe
Maltsev
Mamirova
Mandel
Marienhagen
Mulder
Nicely
Nikhat Zafar
Owen White
Parks
Phil Goetz
Poole
Qi Yang
Ramana Madupu
Riley
Robert Montgomery
Roca
Rouillard
Schuijffel
Slater
Sonnhammer
Tanja Davidsen
Tatusov
Webb
Xiang
Publication venue: Oxford University Press
Publication date
Field of study

The Comprehensive Microbial Resource or CMR (http://cmr.jcvi.org) provides a web-based central resource for the display, search and analysis of the sequence and annotation for complete and publicly available bacterial and archaeal genomes. In addition to displaying the original annotation from GenBank, the CMR makes available secondary automated structural and functional annotation across all genomes to provide consistent data types necessary for effective mining of genomic data. Precomputed homology searches are stored to allow meaningful genome comparisons. The CMR supplies users with over 50 different tools to utilize the sequence and annotation data across one or more of the 571 currently available genomes. At the gene level users can view the gene annotation and underlying evidence. Genome level information includes whole genome graphical displays, biochemical pathway maps and genome summary data. Comparative tools display analysis between genomes with homology and genome alignment tools, and searches across the accessions, annotation, and evidence assigned to all genes/genomes are available. The data and tools on the CMR aid genomic research and analysis, and the CMR is included in over 200 scientific publications. The code underlying the CMR website and the CMR database are freely available for download with no license restrictions

Crossref

PubMed Central

Recommended from our members

Automated annotation of Caenorhabditis mitochondrial genomes and phylogenetic analysis

Author: Campbell Jessica L.
Publication venue: 'Oregon State University'
Publication date
Field of study

Motivation Relatively little is known about the evolution of mitochondrial genomes between Caenorhabditis species despite decades of research. Both the worldwide search for new nematode species and the effort to increase sequencing speeds require automated tools to quickly characterize and categorize large amounts of data. Results We developed an automated annotation tool for nematode mitochondrial genome sequences that characterizes individual genes in addition to pseudogenic and intergenic regions. The automated annotation tool utilizes the ClustalW multiple sequence alignment program to produce gene alignments of input genomic sequence versus literature C. elegans mitochondrial genes. A phylogenetic analysis was conducted by using RAxML 7.2.6 to create a maximum-likelihood tree with bootstrap analysis for 84 mitochondrial genomic sequences representing 23 Caenorhabditis species and 1 outgroup. Conclusion The automated annotation tool for Caenorhabditis mitochondrial genome sequences is fast and effective. The phylogram resulting from phylogenetic analysis provided preliminary insights into the evolutionary relationships among several Caenorhabditis species' mitochondrial genomes

ScholarsArchive@OSU

GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

Author: Gkogkas Christos G
Hamodrakas Stavros J
Promponas Vasilis J
Vernikos Georgios S
Publication venue: BioMed Central
Publication date: 31/10/2003
Field of study

BACKGROUND: The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. RESULTS: GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources) and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI) allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. CONCLUSIONS: GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating systems, provided that the appropriate Java Runtime Environment is already installed in the system

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella

Author: McConkey G.A.
Pinney J.W.
Shirley M.W.
Westhead D.R.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2005
Field of study

The metabolic SearcH And Reconstruction Kit (metaSHARK) is a new fully automated software package for the detection of enzyme-encoding genes within unannotated genome data and their visualization in the context of the surrounding metabolic network. The gene detection package (SHARKhunt) runs on a Linux systemand requires only a set of raw DNA sequences (genomic, expressed sequence tag and/ or genome survey sequence) as input. Its output may be uploaded to our web-based visualization tool (SHARKview) for exploring and comparing data from different organisms. We first demonstrate the utility of the software by comparing its results for the raw Plasmodium falciparum genome with the manual annotations available at the PlasmoDB and PlasmoCyc websites. We then apply SHARKhunt to the unannotated genome sequences of the coccidian parasite Eimeria tenella and observe that, at an E-value cut-off of 10(-20), our software makes 142 additional assertions of enzymatic function compared with a recent annotation package working with translated open reading frame sequences. The ability of the software to cope with low levels of sequence coverage is investigated by analyzing assemblies of the E.tenella genome at estimated coverages from 0.5x to 7.5x. Lastly, as an example of how metaSHARK can be used to evaluate the genomic evidence for specific metabolic pathways, we present a study of coenzyme A biosynthesis in P.falciparum and E.tenella

Crossref

PubMed Central

White Rose Research Online

Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

Author: Antonio Baltazar A.
Aono Hideo
Apweiler Rolf
Barrero Roberto A.
Bruskiewich Richard
Bureau Thomas
Burr Benjamin
Burr Frances
Costa de Oliveira Antonio
Fujii Yasuyuki
Fuks Galina
Gojobori Takashi
Habara Takuya
Haberer Georg
Han Bin
Harada Erimi
Higo Kenichi
Hilton Phillip B.
Hiraki Aiko T.
Hirochika Hirohiko
Hoen Douglas
Hokari Hiroki
Hosokawa Satomi
Hsing Yue
Ikawa Hiroshi
Ikeo Kazuho
Imanishi Tadashi
Ito Yukiyo
Itoh Takeshi
Jaiswal Pankaj
Kanno Masako
Kawahara Yosihiro
Kawamura Toshiyuki
Kawashima Hiroaki
Khurana Jitendra P.
Kikuchi Shoshi
Komatsu Setsuko
Koyanagi Kanako O.
Kubooka Hiromi
Liberherr Damien
Lin Yao-Cheng
Lonsdale David
Matsumoto Takashi
Matsuya Akihiro
McCombie W. Richard
Messing Joachim
Miyao Akio
Mulder Nicola
Nagamura Yoshiaki
Nam Jongmin
Namiki Nobukazu
Numa Hisataka
Nurimoto Shin
O'Donovan Claire
Ohyanagi Hajimi
Okido Toshihisa
OOta Satoshi
Osato Naoki
Palmer Lance E.
Quetier Francis
Raghuvanshi Surabh
Saichi Naomi
Sakai Hiroaki
Sakai Yasumichi
Sakata Katsumi
Sakurai Tetsuya
Saski Takuji
Sato Fumihiko
Sato Yoshiharu
Schoof Heiko
Seki Motoaki
Shibata Katsumi
Shibata Michie
Shimizu Yuji
Shinozaki Kazuo
Shinso Yuji
Singh Nagendra K.
Smith-White Brian
Takeda Jun-ichi
Tanaka Tsuyoshi
Tanino Motohiko
Tatusova Tatiana
Thongjuea Supat
Todokoro Fusano
Tsugane Mika
Tyagi Akhilesh K.
Vanavichit Apichart
Wang Aihui
Wing Rod A.
Yamaguchi Kaori
Yamamoto Mayu
Yamamoto Naoyuki
Yamasaki Chisato
Yu Yeisoo
Zhang Hao
Zhao Qiang
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/01/2007
Field of study

We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ~32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene

Crossref

PubMed Central

Queensland University of Technology ePrints Archive

Caltech Authors

University of Queensland eSpace

A quick guide for student-driven community genome annotation

Author: Benoit Joshua B.
Brown Susan J.
D'elia Tom
Flores Mirella
Hosmani Prashant S.
Miller Sherry
Mueller Lukas A.
Munoz-Torres Monica
Saha Surya
Shippy Teresa
Wiersma-Koch Helen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/10/2018
Field of study

High quality gene models are necessary to expand the molecular and genetic tools available for a target organism, but these are available for only a handful of model organisms that have undergone extensive curation and experimental validation over the course of many years. The majority of gene models present in biological databases today have been identified in draft genome assemblies using automated annotation pipelines that are frequently based on orthologs from distantly related model organisms. Manual curation is time consuming and often requires substantial expertise, but is instrumental in improving gene model structure and identification. Manual annotation may seem to be a daunting and cost-prohibitive task for small research communities but involving undergraduates in community genome annotation consortiums can be mutually beneficial for both education and improved genomic resources. We outline a workflow for efficient manual annotation driven by a team of primarily undergraduate annotators. This model can be scaled to large teams and includes quality control processes through incremental evaluation. Moreover, it gives students an opportunity to increase their understanding of genome biology and to participate in scientific research in collaboration with peers and senior researchers at multiple institutions

arXiv.org e-Print Archive

Directory of Open Access Journals

eScholarship - University of California

FigShare

The Chlamydomonas genome project: A decade on

Author: Aksoy M
Blaby IK
Blaby-Haas CE
Dutcher S
Goodstein D
Grimwood J
Grossman A
Harris EH
Hom EFY
King S
Lopez D
Merchant SS
Porter M
Prochnik S
Schmutz J
Stanke M
Tourasse N
Umen J
Vallon O
Witman GB
Publication venue: eScholarship, University of California
Publication date: 01/10/2014
Field of study

The green alga Chlamydomonas reinhardtii is a popular unicellular organism for studying photosynthesis, cilia biogenesis, and micronutrient homeostasis. Ten years since its genome project was initiated an iterative process of improvements to the genome and gene predictions has propelled this organism to the forefront of the omics era. Housed at Phytozome, the plant genomics portal of the Joint Genome Institute (JGI), the most up-to-date genomic data include a genome arranged on chromosomes and high-quality gene models with alternative splice forms supported by an abundance of whole transcriptome sequencing (RNA-Seq) data. We present here the past, present, and future of Chlamydomonas genomics. Specifically, we detail progress on genome assembly and gene model refinement, discuss resources for gene annotations, functional predictions, and locus ID mapping between versions and, importantly, outline a standardized framework for naming genes

PubMed Central

eScholarship - University of California

A Factor Graph Approach to Automated GO Annotation

Author: Elizabeth Tapia
Fernando Roda
Flavia Krsticevic
Flavio E Spetale
Pilar Bulacio
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Roda, Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Bulacio, Pilar Estela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentin

CONICET Digital

Directory of Open Access Journals

PubMed Central

FigShare

Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system

Author: Brusic Vladimir
Nagashima Takeshi
Petrovsky Nikolai
Schonbach Christian
Silva Diego
Socha L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/12/2015
Field of study

BACKGROUND: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. RESULTS: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70–85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders. CONCLUSIONS: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets

The Australian National University