Search CORE

Fraunhofer-ePrints

Caltech Authors

MPG.PuRe

A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons

Author: A Siepel
AA Mironov
B Modrek
B Modrek
BJ Haas
CW Sugnet
D Boffelli
D Brett
DL Black
DL Philipps
G Dror
G Rätsch
GW Yeo
H Nagasqaki
I Korf
J Felsenstein
JD McAuliffe
JE Allen
JM Johnson
Jonathan E Allen
JS Pedersen
L Cartegni
L Croft
M Alexandersson
M Hasegawa
M Hiller
M Hiller
P Carninci
Q Xu
R Sorek
R Sorek
RA Drysdale
RC Edgar
SL Cawley
SS Gross
Steven L Salzberg
T Maniatis
U Ohler
Z Kan
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: An important challenge in eukaryotic gene prediction is accurate identification of alternatively spliced exons. Functional transcripts can go undetected in gene expression studies when alternative splicing only occurs under specific biological conditions. Non-expression based computational methods support identification of rarely expressed transcripts. RESULTS: A non-expression based statistical method is presented to annotate alternatively spliced exons using a single genome sequence and evidence from cross-species sequence conservation. The computational method is implemented in the program ExAlt and an analysis of prediction accuracy is given for Drosophila melanogaster. CONCLUSION: ExAlt identifies the structure of most alternatively spliced exons in the test set and cross-species sequence conservation is shown to improve the precision of predictions. The software package is available to run on Drosophila genomes to search for new cases of alternative splicing

Springer - Publisher Connector

Digital Repository at the University of Maryland

Work ow-based systematic design of high throughput genome annotation

Author: Wu Xikun
Wu Xikun
Publication venue: Computing, Imperial College London
Publication date: 01/10/2009
Field of study

The genus Eimeria belongs to the phylum Apicomplexa, which includes many obligate intra-cellular protozoan parasites of man and livestock. E. tenella is one of seven species that infect the domestic chicken and cause the intestinal disease coccidiosis which is economy important for poultry industry. E. tenella is highly pathogenic and is often used as a model species for the Eimeria biology studies. In this PhD thesis, a comprehensive annotation system named as \WAGA" (Workflow-based Automatically Genome Annotation) was built and applied to the E. tenella genome. InforSense KDE, and its BioSense plug-in (products of the InforSense Company), were the core softwares used to build the workflows. Workflows were made by integrating individual bioinformatics tools into a single platform. Each workflow was designed to provide a standalone service for a particular task. Three major workflows were developed based on the genomic resources currently available for E. tenella. These were of ESTs-based gene construction, HMM-based gene prediction and protein-based annotation. Finally, a combining workflow was built to sit above the individual ones to generate a set of automatic annotations using all of the available information. The overall system and its three major components were deployed as web servers that are fully tuneable and reusable for end users. WAGA does not require users to have programming skills or knowledge of the underlying algorithms or mechanisms of its low level components. E. tenella was the target genome here and all the results obtained were displayed by GBrowse. A sample of the results is selected for experimental validation. For evaluation purpose, WAGA was also applied to another Apicomplexa parasite, Plasmodium falciparum, the causative agent of human malaria, which has been extensively annotated. The results obtained were compared with gene predictions of PHAT, a gene finder designed for and used in the P. falciparum genome project

Spiral - Imperial College Digital Repository

Genome-Wide Association between Branch Point Properties and Alternative Splicing

Author: A Corvelo
A Deirdre
A Loytynoja
André Corvelo
B Modrek
B Patterson
B Rhead
B Ruskin
BR Graveley
C Burge
C Gooding
C Gooding
CF Bourgeois
Christopher W. J. Smith
CJ Coolidge
CW Smith
CW Smith
D Libri
DD Licatalosi
DL Black
DM Helfman
DM Kupfer
E Blanco
E Bon
Eduardo Eyras
F Clark
G Kol
G Yeo
GJ Mulligan
HX Liu
IL Hofacker
Irmtraud M. Meyer
J Southby
K Gao
M Goux-Pelletan
M Hallegger
M Plass
M Stanke
MA Garcia-Blanco
Martina Hallegger
MB Stadler
MC Wollerton
MC Wollerton
MR Green
MS Jurica
N Bellora
NA Faustino
R Castelo
R Reed
SH Schwartz
T Joachims
T Maniatis
TW Nilsen
WG Fairbrother
WJ Kent
X Xiao
XH Zhang
XH Zhang
Z Wang
Z Wang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3′ end of introns, with distance to the 3′ splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models

CiteSeerX

Oxford University Research Archive

UCL Discovery

UPF Digital Repository

Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs

Author: Baggerman Geert
Crappé Jeroen
Hayakawa Eisuke
Luyten Walter
Menschaert Gerben
Trooskens Geert
Van Criekinge Wim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e. g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by chance. Nevertheless, over the past few years many such non-canonical genes (with ORFs < 100 AAs) have been discovered in different organisms like Arabidopsis thaliana, Saccharomyces cerevisiae, and Drosophila melanogaster. Thanks to advances in sequencing, bioinformatics and computing power, it is now possible to scan the genome in unprecedented scrutiny, for example in a search of this type of small ORFs. Results: Using bioinformatics methods, we performed a systematic search for putatively functional sORFs in the Mus musculus genome. A genome-wide scan detected all sORFs which were subsequently analyzed for their coding potential, based on evolutionary conservation at the AA level, and ranked using a Support Vector Machine (SVM) learning model. The ranked sORFs are finally overlapped with ribosome profiling data, hinting to sORF translation. All candidates are visually inspected using an in-house developed genome browser. In this way dozens of highly conserved sORFs, targeted by ribosomes were identified in the mouse genome, putatively encoding micropeptides. Conclusion: Our combined genome-wide approach leads to the prediction of a comprehensive but manageable set of putatively coding sORFs, a very important first step towards the identification of a new class of bioactive peptides, called micropeptides

Springer - Publisher Connector

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

A procedure for identifying homologous alternative splicing events

Author: de la Cruz Xavier
Hospital Adam
Orozco Modesto
Talavera David
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The study of the functional role of alternative splice isoforms of a gene is a very active area of research in biology. The difficulty of the experimental approach (in particular, in its high-throughput version) leaves ample room for the development of bioinformatics tools that can provide a useful first picture of the problem. Among the possible approaches, one of the simplest is to follow classical protein function annotation protocols and annotate target alternative splice events with the information available from conserved events in other species. However, the application of this protocol requires a procedure capable of recognising such events. Here we present a simple but accurate method developed for this purpose. Results We have developed a method for identifying homologous, or equivalent, alternative splicing events, based on the combined use of neural networks and sequence searches. The procedure comprises four steps: (i) BLAST search for homologues of the two isoforms defining the target alternative splicing event; (ii) construction of all possible candidate events; (iii) scoring of the latter with a series of neural networks; and (iv) filtering of the results. When tested in a set of 473 manually annotated pairs of homologous events, our method showed a good performance, with an accuracy of 0.99, a precision of 0.98 and a sensitivity of 0.93. When no candidates were available, the specificity of our method varied between 0.81 and 0.91. Conclusion The method described in this article allows the identification of homologous alternative splicing events, with a good success rate, indicating that such method could be used for the development of functional annotation of alternative splice isoforms.</p

Springer - Publisher Connector