Search CORE

4,319 research outputs found

REPARATION : ribosome profiling assisted (re-)annotation of bacterial genomes

Author: Giess Adam
Jonckheere Veronique
Menschaert Gerben
Ndah Elvis
Valen Eivind
Van Damme Petra
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/ REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames

Ghent University Academic Bibliography

Ribosome signatures aid bacterial translation initiation site identification

Author: Chyżyńska Katarzyna
Giess Adam
Jonckheere Veronique
Ndah Elvis
Valen Eivind
Van Damme Petra
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Background: While methods for annotation of genes are increasingly reliable, the exact identification of translation initiation sites remains a challenging problem. Since the N-termini of proteins often contain regulatory and targeting information, developing a robust method for start site identification is crucial. Ribosome profiling reads show distinct patterns of read length distributions around translation initiation sites. These patterns are typically lost in standard ribosome profiling analysis pipelines, when reads from footprints are adjusted to determine the specific codon being translated. Results: Utilising these signatures in combination with nucleotide sequence information, we build a model capable of predicting translation initiation sites and demonstrate its high accuracy using N-terminal proteomics. Applying this to prokaryotic translatomes, we re-annotate translation initiation sites and provide evidence of N-terminal truncations and extensions of previously annotated coding sequences. These re-annotations are supported by the presence of structural and sequence-based features next to N-terminal peptide evidence. Finally, our model identifies 61 novel genes previously undiscovered in the Salmonella enterica genome. Conclusions: Signatures within ribosome profiling read length distributions can be used in combination with nucleotide sequence information to provide accurate genome-wide identification of translation initiation sites

Ghent University Academic Bibliography

Directory of Open Access Journals

Kernel methods in genomics and computational biology

Author: Vert Jean-Philippe
Publication venue
Publication date: 17/10/2005
Field of study

Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

arXiv.org e-Print Archive

HAL-MINES ParisTech

Role of 3′UTRs in the Translation of mRNAs Regulated by Oncogenic eIF4E—A Computational Inference

Author: A Eulalio
A Stark
AC Gingras
AC Gingras
AE Koromilas
Arti N. Santhanam
Bruce A. Shapiro
CC Chang
DH Mathews
DP Bartel
E Bindewald
Eckart Bindewald
G Mathonnet
H Hirling
IL Hofacker
Jason E. Stajich
JB Cowland
JR Graff
JW Hershey
L He
LM Shantz
LS Hon
M Harvey
M Kertesz
M Metzler
MS Kumar
Nahum Sonenberg
Nancy H. Colburn
NK Gray
O Larsson
O Larsson
O Meyuhas
Ola Larsson
PM Voorhoeve
R Sandberg
RC Gentleman
S Griffiths-Jones
S Griffiths-Jones
S Nottrott
SA Shabalina
SF Altschul
SG Zimmer
SH Bernhart
TG Lawson
TJ Macke
VA Polunovsky
Vinagolu K. Rajasekhar
VK Rajasekhar
W Filipowicz
WHWWA Kruskall
Y Akao
Y Mamane
Y Tsuchiya
Z Wu
Z Wu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Eukaryotic cap-dependent mRNA translation is mediated by the initiation factor eIF4E, which binds mRNAs and stimulates efficient translation initiation. eIF4E is often overexpressed in human cancers. To elucidate the molecular signature of eIF4E target mRNAs, we analyzed sequence and structural properties of two independently derived polyribosome recruited mRNA datasets. These datasets originate from studies of mRNAs that are actively being translated in response to cells over-expressing eIF4E or cells with an activated oncogenic AKT: eIF4E signaling pathway, respectively. Comparison of eIF4E target mRNAs to mRNAs insensitive to eIF4E-regulation has revealed surprising features in mRNA secondary structure, length and microRNA-binding properties. Fold-changes (the relative change in recruitment of an mRNA to actively translating polyribosomal complexes in response to eIF4E overexpression or AKT upregulation) are positively correlated with mRNA G+C content and negatively correlated with total and 3′UTR length of the mRNAs. A machine learning approach for predicting the fold change was created. Interesting tendencies of secondary structure stability are found near the start codon and at the beginning of the 3′UTR region. Highly upregulated mRNAs show negative selection (site avoidance) for binding sites of several microRNAs. These results are consistent with the emerging model of regulation of mRNA translation through a dynamic balance between translation initiation at the 5′UTR and microRNA binding at the 3′UTR

Crossref

Directory of Open Access Journals

PubMed Central

Translation initiation site prediction on a genomic scale : beauty in simplicity

Author: Borodovsky
Delcher
Fickett
Hatzigeorgiou
Kozak
Kozak
Kozak
Li
Li
Li
Liu
Nishikawa
Pedersen
Salamov
Salzberg
Salzberg
Sven Degroeve
Thomas Abeel
Tiwari
Wang
Yvan Saeys
Yves Van de Peer
Zeng
Zien
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2007
Field of study

Motivation: The correct identification of translation initiation sites (TIS) remains a challenging problem for computational methods that automatically try to solve this problem. Furthermore, the lion's share of these computational techniques focuses on the identification of TIS in transcript data. However, in the gene prediction context the identification of TIS occurs on the genomic level, which makes things even harder because at the genome level many more pseudo-TIS occur, resulting in models that achieve a higher number of false positive predictions. Results: In this article, we evaluate the performance of several 'simple' TIS recognition methods at the genomic level, and compare them to state-of-the-art models for TIS prediction in transcript data. We conclude that the simple methods largely outperform the complex ones at the genomic scale, and we propose a new model for TIS recognition at the genome level that combines the strengths of these simple models. The new model obtains a false positive rate of 0.125 at a sensitivity of 0.80 on a well annotated human chromosome ( chromosome 21). Detailed analyses show that the model is useful, both on its own and in a simple gene prediction setting

Crossref

Ghent University Academic Bibliography

Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA

Author: de Souza Teixeira Felipe Carvalho
Nobre Cristiane Neri
Ortega José Miguel
Silva Lívia Márcia
Zárate Luis Enrique
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The accurate prediction of the initiation of translation in sequences of mRNA is an important activity for genome annotation. However, obtaining an accurate prediction is not always a simple task and can be modeled as a problem of classification between positive sequences (protein codifiers) and negative sequences (non-codifiers). The problem is highly imbalanced because each molecule of mRNA has a unique translation initiation site and various others that are not initiators. Therefore, this study focuses on the problem from the perspective of balancing classes and we present an undersampling balancing method, M-clus, which is based on clustering. The method also adds features to sequences and improves the performance of the classifier through the inclusion of knowledge obtained by the model, called InAKnow. Results Through this methodology, the measures of performance used (accuracy, sensitivity, specificity and adjusted accuracy) are greater than 93% for the <it>Mus musculus</it> and <it>Rattus norvegicus</it> organisms, and varied between 72.97% and 97.43% for the other organisms evaluated: <it>Arabidopsis thaliana</it>, <it>Caenorhabditis elegans</it>, <it>Drosophila melanogaster</it>, <it>Homo sapiens</it>, <it>Nasonia vitripennis</it>. The precision increases significantly by 39% and 22.9% for <it>Mus musculus</it> and <it>Rattus norvegicus</it>, respectively, when the knowledge obtained by the model is included. For the other organisms, the precision increases by between 37.10% and 59.49%. The inclusion of certain features during training, for example, the presence of ATG in the upstream region of the Translation Initiation Site, improves the rate of sensitivity by approximately 7%. Using the M-Clus balancing method generates a significant increase in the rate of sensitivity from 51.39% to 91.55% (<it>Mus musculus</it>) and from 47.45% to 88.09% (<it>Rattus norvegicus</it>). Conclusions In order to solve the problem of TIS prediction, the results indicate that the methodology proposed in this work is adequate, particularly when using the concept of acquired knowledge which increased the accuracy in all databases evaluated.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integrated Features by Administering the Support Vector Machine (SVM) of Translational Initiations Sites in Alternative Polymorphic Contex

Author: Burairah Hussin
Burairah Hussin
Nanna Suryana Herman
Nanna Suryana Herman
Nurul Arneida Husin
Publication venue: Bulgarian Academy of Sciences
Publication date: 01/04/2012
Field of study

Many algorithms and methods have been proposed for classification problems in bioinformatics. In this study, the discriminative approach in particular support vector machines (SVM) is employed to recognize the studied TIS patterns. The applied discriminative approach is used to learn about some discriminant functions of samples that have been labelled as positive or negative. After learning, the discriminant functions are employed to decide whether a new sample is true or false. In this study, support vector machines (SVM) is employed to recognize the patterns for studied translational initiation sites in alternative weak context. The method has been optimized with the best parameters selected; c=100, E=10-6 and ex=2 for non linear kernel function. Results show that with top 5 features and non linear kernel, the best prediction accuracy achieved is 95.8%. J48 algorithm is applied to compare with SVM with top 15 features and the results show a good prediction accuracy of 95.8%. This indicates that the top 5 features selected by the IGR method and that are performed by SVM are sufficient to use in the prediction of TIS in weak contexts

Directory of Open Access Journals

Applications of Support Vector Machines in Bioinformatics and Network Security

Author: Rehan Akbani
Turgay Korkmaz
Publication venue: 'IntechOpen'
Publication date: 01/02/2010
Field of study

IntechOpen