Search CORE

148 research outputs found

Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources

Author: Morgenstern Burkhard
Schöffmann Oliver
Stanke Mario
Waack Stephan
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In order to improve gene prediction, extrinsic evidence on the gene structure can be collected from various sources of information such as genome-genome comparisons and EST and protein alignments. However, such evidence is often incomplete and usually uncertain. The extrinsic evidence is usually not sufficient to recover the complete gene structure of all genes completely and the available evidence is often unreliable. Therefore extrinsic evidence is most valuable when it is balanced with sequence-intrinsic evidence. RESULTS: We present a fairly general method for integration of external information. Our method is based on the evaluation of hints to potentially protein-coding regions by means of a Generalized Hidden Markov Model (GHMM) that takes both intrinsic and extrinsic information into account. We used this method to extend the ab initio gene prediction program AUGUSTUS to a versatile tool that we call AUGUSTUS+. In this study, we focus on hints derived from matches to an EST or protein database, but our approach can be used to include arbitrary user-defined hints. Our method is only moderately effected by the length of a database match. Further, it exploits the information that can be derived from the absence of such matches. As a special case, AUGUSTUS+ can predict genes under user-defined constraints, e.g. if the positions of certain exons are known. With hints from EST and protein databases, our new approach was able to predict 89% of the exons in human chromosome 22 correctly. CONCLUSION: Sensitive probabilistic modeling of extrinsic evidence such as sequence database matches can increase gene prediction accuracy. When a match of a sequence interval to an EST or protein sequence is used it should be treated as compound information rather than as information about individual positions

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Annotation of Fusarium graminearum (PH-1) version 5.0

Author: Hammond-Kosack K. E.
King R.
Urban M.
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2017
Field of study

Fusarium graminearum floral infections are a major risk to the global supply of safe cereal grains. We report updates to the PH-1 reference genome and significant improvements to the annotation. Changes include introduction of legacy annotation identifiers, new gene models, secretome and effectorP predictions, and inclusion of extensive untranslated region (UTR) annotation

Crossref

PubMed Central

Rothamsted Repository

AUGUSTUS: ab initio prediction of alternative transcripts

Author: Gunduz Irfan
Hayes Alec
Keller Oliver
Morgenstern Burkhard
Stanke Mario
Waack Stephan
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

AUGUSTUS is a software tool for gene prediction in eukaryotes based on a Generalized Hidden Markov Model, a probabilistic model of a sequence and its gene structure. Like most existing gene finders, the first version of AUGUSTUS returned one transcript per predicted gene and ignored the phenomenon of alternative splicing. Herein, we present a WWW server for an extended version of AUGUSTUS that is able to predict multiple splice variants. To our knowledge, this is the first ab initio gene finder that can predict multiple transcripts. In addition, we offer a motif searching facility, where user-defined regular expressions can be searched against putative proteins encoded by the predicted genes. The AUGUSTUS web interface and the downloadable open-source stand-alone program are freely available from

CiteSeerX

Crossref

PubMed Central

HMMConverter 1.0: a toolbox for hidden Markov models

Author: Altschul
Baum
Birney
Churbanov
Durbin
Finn
Hirschberg
Hosking
Irmtraud M. Meyer
Lam
Lunter
Meyer
Meyer
Miklós
Nguyen
Schütz
Searls
Stanke
Steffen
Tin Yin Lam
Viterbi
Viterbi
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Hidden Markov models (HMMs) and their variants are widely used in Bioinformatics applications that analyze and compare biological sequences. Designing a novel application requires the insight of a human expert to define the model's architecture. The implementation of prediction algorithms and algorithms to train the model's parameters, however, can be a time-consuming and error-prone task. We here present HMMConverter, a software package for setting up probabilistic HMMs, pair-HMMs as well as generalized HMMs and pair-HMMs. The user defines the model itself and the algorithms to be used via an XML file which is then directly translated into efficient C++ code. The software package provides linear-memory prediction algorithms, such as the Hirschberg algorithm, banding and the integration of prior probabilities and is the first to present computationally efficient linear-memory algorithms for automatic parameter training. Users of HMMConverter can thus set up complex applications with a minimum of effort and also perform parameter training and data analyses for large data sets

mGene.web: a web service for accurate computational gene finding

Author: A. Zien
Bernal
Besemer
Brent
C. S. Ong
G. Ratsch
G. Schweikert
G. Zeller
J. Behr
S. Sonnenburg
Salamov
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp)

Edinburgh Research Explorer

MPG.PuRe

mGene.web: a web service for accurate computational gene finding

Author: A. Zien
Bernal
Besemer
Brent
C. S. Ong
G. Ratsch
G. Schweikert
G. Zeller
J. Behr
S. Sonnenburg
Salamov
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Edinburgh Research Explorer

MPG.PuRe

High-quality genome assembly of Capsella bursa-pastoris reveals asymmetry of regulatory elements at early stages of polyploid genome evolution

Author: Besedina E.
Fedotova A.
Gerasimov E.
Kasianov A.
Klepikova A.
Kondrashov A.
Kulakovskiy I.
Logacheva M.
Penin A.
Publication venue
Publication date: 01/01/2017
Field of study

© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd Polyploidization and subsequent sub- and neofunctionalization of duplicated genes represent a major mechanism of plant genome evolution. Capsella bursa-pastoris, a widespread ruderal plant, is a recent allotetraploid and, thus, is an ideal model organism for studying early changes following polyploidization. We constructed a high-quality assembly of C. bursa-pastoris genome and a transcriptome atlas covering a broad sample of organs and developmental stages (available online at http://travadb.org/browse/Species=Cbp). We demonstrate that expression of homeologs is mostly symmetric between subgenomes, and identify a set of homeolog pairs with discordant expression. Comparison of promoters within such pairs revealed emerging asymmetry of regulatory elements. Among them there are multiple binding sites for transcription factors controlling the regulation of photosynthesis and plant development by light (PIF3, HY5) and cold stress response (CBF). These results suggest that polyploidization in C. bursa-pastoris enhanced its plasticity of response to light and temperature, and allowed substantial expansion of its distribution range

Kazan Federal University Digital Repository

Companion: a web server for annotation and analysis of parasite genomes

Author: Berriman Matthew
Brunk Brian
Foth Bernardo
Hertz-Fowler Christiane
Otto Thomas D.
Silva-Franco Fatima
Steinbiss Sascha
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/04/2016
Field of study

Currently available sequencing technologies enable quick and economical sequencing of many new eukaryotic parasite (apicomplexan or kinetoplastid) species or strains. Compared to SNP calling approaches, de novo assembly of these genomes enables researchers to additionally determine insertion, deletion and recombination events as well as to detect complex sequence diversity, such as that seen in variable multigene families. However, there currently are no automated eukaryotic annotation pipelines offering the required range of results to facilitate such analyses. A suitable pipeline needs to perform evidence-supported gene finding as well as functional annotation and pseudogene detection up to the generation of output ready to be submitted to a public database. Moreover, no current tool includes quick yet informative comparative analyses and a first pass visualization of both annotation and analysis results. To overcome those needs we have developed the Companion web server (http://companion.sanger.ac.uk) providing parasite genome annotation as a service using a reference-based approach. We demonstrate the use and performance of Companion by annotating two Leishmania and Plasmodium genomes as typical parasite cases and evaluate the results compared to manually annotated references

University of Liverpool Repository

Crossref

PubMed Central

Enlighten

Re-annotation of the woodland strawberry (Fragaria vesca) genome

Author: Janet P Slovin
Nadim W Alkharouf
Omar Darwish
Rachel Shahan
Zhongchi Liu
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Fragaria vesca is a low-growing, small-fruited diploid strawberry species commonly called woodland strawberry. It is native to temperate regions of Eurasia and North America and while it produces edible fruits, it is most highly useful as an experimental perennial plant system that can serve as a model for the agriculturally important Rosaceae family. A draft of the F. vesca genome sequence was published in 2011 [Nat Genet 43:223,2011]. The first generation annotation (version 1.1) were developed using GeneMark-ES+[Nuc Acids Res 33:6494,2005]which is a self-training gene prediction tool that relies primarily on the combination of ab initio predictions with mapping high confidence ESTs in addition to mapping gene deserts from transposable elements. Based on over 25 different tissue transcriptomes, we have revised the F. vesca genome annotation, thereby providing several improvements over version 1.1. The new annotation, which was achieved using Maker, describes many more predicted protein coding genes compared to the GeneMark generated annotation that is currently hosted at the Genome Database for Rosaceae (http://www.rosaceae.org/). Our new annotation also results in an increase in the overall total coding length, and the number of coding regions found. The total number of gene predictions that do not overlap with the previous annotations is 2286, most of which were found to be homologous to other plant genes. We have experimentally verified one of the new gene model predictions to validate our results. Using the RNA-Seq transcriptome sequences from 25 diverse tissue types, the re-annotation pipeline improved existing annotations by increasing the annotation accuracy based on extensive transcriptome data. It uncovered new genes, added exons to current genes, and extended or merged exons. This complete genome re-annotation will significantly benefit functional genomic studies of the strawberry and other members of the Rosaceae.https://doi.org/10.1186/s12864-015-1221-

Crossref

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

Comparative Genomics Suggests that the Fungal Pathogen Pneumocystis Is an Obligate Parasite Scavenging Amino Acids from Its Host's Lungs

Author: A Nahimana
A Omsland
AE Wakefield
AE Wakefield
AM Schnoes
BL Cantarel
C Atzori
C Demanche
CF Thomas Jr
CF Thomas Jr
CM Aliouat-Denis
Dominique Sanglard
E Birney
ES Kaneshiro
F Ewann
F Gigliotti
Frédéric X. Burdet
G Kutty
HE Ambrose
I Korf
J Hau
J Sugiyama
Jason E. Stajich
JL Davis
JO Andersson
JR Stringer
KR Sakharkar
Laurent Keller
M Basselin
M Nowrousian
M Nowrousian
M Stanke
Marco Pagni
MG Rodrigues
MH Choi
MJ Gardner
MT Cushion
MT Cushion
N Corradi
Ousmane H. Cissé
Patrick Taffé
Philippe M. Hauser
PJ Keeling
S Okuda
SF Altschul
SH Payne
SP Keely
TM Joffrion
U Reichard
V Ter-Hovhannisyan
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Pneumocystis jirovecii is a fungus causing severe pneumonia in immuno-compromised patients. Progress in understanding its pathogenicity and epidemiology has been hampered by the lack of a long-term in vitro culture method. Obligate parasitism of this pathogen has been suggested on the basis of various features but remains controversial. We analysed the 7.0 Mb draft genome sequence of the closely related species Pneumocystis carinii infecting rats, which is a well established experimental model of the disease. We predicted 8’085 (redundant) peptides and 14.9% of them were mapped onto the KEGG biochemical pathways. The proteome of the closely related yeast Schizosaccharomyces pombe was used as a control for the annotation procedure (4’974 genes, 14.1% mapped). About two thirds of the mapped peptides of each organism (65.7% and 73.2%, respectively) corresponded to crucial enzymes for the basal metabolism and standard cellular processes. However, the proportion of P. carinii genes relative to those of S. pombe was significantly smaller for the “amino acid metabolism” category of pathways than for all other categories taken together (40 versus 114 against 278 versus 427, P<0.002). Importantly, we identified in P. carinii only 2 enzymes specifically dedicated to the synthesis of the 20 standard amino acids. By contrast all the 54 enzymes dedicated to this synthesis reported in the KEGG atlas for S. pombe were detected upon reannotation of S. pombe proteome (2 versus 54 against 278 versus 427, P<0.0001). This finding strongly suggests that species of the genus Pneumocystis are scavenging amino acids from their host's lung environment. Consequently, they would have no form able to live independently from another organism, and these parasites would be obligate in addition to being opportunistic. These findings have implications for the management of patients susceptible to P. jirovecii infection given that the only source of infection would be other humans

Public Library of Science (PLOS)

Crossref

Serveur académique lausannois

Directory of Open Access Journals

PubMed Central