Search CORE

6,702 research outputs found

PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions

Author: Arvestad
Blanchette
Brent
Butler
Clark
Goldman
Guttman
Guttman
Holmes
I. Jungreis
Kellis
Lin
M. F. Lin
M. Kellis
Ota
Ozsolak
Stark
Whelan
Yang
Publication venue
Publication date: 17/08/2010
Field of study

As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species _Drosophila_ genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE

Tv-RIO1 – an atypical protein kinase from the parasitic nematode Trichostrongylus vitrinus

Author: Gasser Robin B.
Hu Min
LaRonde-LeBlanc Nicole
Sternberg Paul W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Background: Protein kinases are key enzymes that regulate a wide range of cellular processes, including cell-cycle progression, transcription, DNA replication and metabolic functions. These enzymes catalyse the transfer of phosphates to serine, threonine and tyrosine residues, thus playing functional roles in reversible protein phosphorylation. There are two main groups, namely eukaryotic protein kinases (ePKs) and atypical protein kinases (aPKs); RIO kinases belong to the latter group. While there is some information about RIO kinases and their roles in animals, nothing is known about them in parasites. This is the first study to characterise a RIO1 kinase from any parasite. Results: A full-length cDNA (Tv-rio-1) encoding a RIO1 protein kinase (Tv-RIO1) was isolated from the economically important parasitic nematode Trichostrongylus vitrinus (Order Strongylida). The uninterrupted open reading frame (ORF) of 1476 nucleotides encoded a protein of 491 amino acids, containing the characteristic RIO1 motif LVHADLSEYNTL. Tv-rio-1 was transcribed at the highest level in the third-stage larva (L3), and a higher level in adult females than in males. Comparison with homologues from other organisms showed that protein Tv-RIO1 had significant homology to related proteins from a range of metazoans and plants. Amino acid sequence identity was most pronounced in the ATP-binding motif, active site and metal binding loop. Phylogenetic analyses of selected amino acid sequence data revealed Tv-RIO1 to be most closely related to the proteins in the species of Caenorhabditis. A structural model of Tv-RIO1 was constructed and compared with the published crystal structure of RIO1 of Archaeoglobus fulgidus (Af-Rio1). Conclusion: This study provides the first insights into the RIO1 protein kinases of nematodes, and a foundation for further investigations into the biochemical and functional roles of this molecule in biological processes in parasitic nematodes

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Caltech Authors

Digital Repository at the University of Maryland

University of Melbourne Institutional Repository

REPARATION : ribosome profiling assisted (re-)annotation of bacterial genomes

Author: Giess Adam
Jonckheere Veronique
Menschaert Gerben
Ndah Elvis
Valen Eivind
Van Damme Petra
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/ REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames

Ghent University Academic Bibliography

Needed for completion of the human genome: hypothesis driven experiments and biologically realistic mathematical models

Author: Birney Ewan
Brent Michael
Crollius Hugues Roest
Dermitzakis Emmanouil
Guigo Roderic
Pachter Lior
Solovyev Victor
Zhang Michael Q.
Publication venue
Publication date: 06/10/2004
Field of study

With the sponsorship of ``Fundacio La Caixa'' we met in Barcelona, November 21st and 22nd, to analyze the reasons why, after the completion of the human genome sequence, the identification all protein coding genes and their variants remains a distant goal. Here we report on our discussions and summarize some of the major challenges that need to be overcome in order to complete the human gene catalog.Comment: Report and discussion resulting from the `Fundacio La Caixa' gene finding meeting held November 21 and 22 2003 in Barcelon

arXiv.org e-Print Archive

Caltech Authors

Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins

Author: Ali Mehrab
Babu Mohan
Butland Gareth
Chandran Shamanta
Christopolous Constantine
Emili Andrew
Eroukova Veronika
Golshani Ashkan
Greenblatt Jack F.
Guao Xinghua
Hu Pingzhao
Janga Sarah Chandra
Moreno-Hagelsieb Gabriel
Musso Gabriela
Nazarians-Armavil Anaies
Nazemof Nazila
Paccanaro Alberto
Phanse Sadhna
Pogoutse Oxana
Wong Peter
Yang Wenhong
Publication venue: Scholars Commons @ Laurier
Publication date: 01/04/2009
Field of study

One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins

Wilfrid Laurier University

Deep proteogenomics; high throughput gene validation by multidimensional liquid chromatography and mass spectrometry of proteins from the fungal wheat pathogen Stagonospora nodorum

Author: Bringans Scott
Casey Tammy
Hane James
Lipscombe Richard J
Oliver Richard Peter
Solomon Peter S
Tan Kar-Chun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2016
Field of study

BACKGROUND: Stagonospora nodorum, a fungal ascomycete in the class dothideomycetes, is a damaging pathogen of wheat. It is a model for necrotrophic fungi that cause necrotic symptoms via the interaction of multiple effector proteins with cultivar-specific receptors. A draft genome sequence and annotation was published in 2007. A second-pass gene prediction using a training set of 795 fully EST-supported genes predicted a total of 10762 version 2 nuclear-encoded genes, with an additional 5354 less reliable version 1 genes also retained. RESULTS: In this study, we subjected soluble mycelial proteins to proteolysis followed by 2D LC MALDI-MS/MS. Comparison of the detected peptides with the gene models validated 2134 genes. 62% of these genes (1324) were not supported by prior EST evidence. Of the 2134 validated genes, all but 188 were version 2 annotations. Statistical analysis of the validated gene models revealed a preponderance of cytoplasmic and nuclear localised proteins, and proteins with intracellularassociated GO terms. These statistical associations are consistent with the source of the peptides used in the study. Comparison with a 6-frame translation of the S. nodorum genome assembly confirmed 905 existing gene annotations (including 119 not previously confirmed) and provided evidence supporting 144 genes with coding exon frameshift modifications, 604 genes with extensions of coding exons into annotated introns or untranslated regions (UTRs), 3 new gene annotations which were supported by tblastn to NR, and 44 potential new genes residing within un-assembled regions of the genome. CONCLUSION: We conclude that 2D LC MALDI-MS/MS is a powerful, rapid and economical tool to aid in the annotation of fungal genomic assemblies

The Australian National University

Ribosome signatures aid bacterial translation initiation site identification

Author: Chyżyńska Katarzyna
Giess Adam
Jonckheere Veronique
Ndah Elvis
Valen Eivind
Van Damme Petra
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Background: While methods for annotation of genes are increasingly reliable, the exact identification of translation initiation sites remains a challenging problem. Since the N-termini of proteins often contain regulatory and targeting information, developing a robust method for start site identification is crucial. Ribosome profiling reads show distinct patterns of read length distributions around translation initiation sites. These patterns are typically lost in standard ribosome profiling analysis pipelines, when reads from footprints are adjusted to determine the specific codon being translated. Results: Utilising these signatures in combination with nucleotide sequence information, we build a model capable of predicting translation initiation sites and demonstrate its high accuracy using N-terminal proteomics. Applying this to prokaryotic translatomes, we re-annotate translation initiation sites and provide evidence of N-terminal truncations and extensions of previously annotated coding sequences. These re-annotations are supported by the presence of structural and sequence-based features next to N-terminal peptide evidence. Finally, our model identifies 61 novel genes previously undiscovered in the Salmonella enterica genome. Conclusions: Signatures within ribosome profiling read length distributions can be used in combination with nucleotide sequence information to provide accurate genome-wide identification of translation initiation sites

Ghent University Academic Bibliography

Directory of Open Access Journals

N-terminal proteomics assisted profiling of the unexplored translation initiation landscape in Arabidopsis thaliana

Author: Gevaert Kris
Jonckheere Veronique
Martens Lennart
Ndah Elvis
Stael Simon
Sticker Adriaan
Van Breusegem Frank
Van Damme Petra
Willems Patrick
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date: 01/01/2017
Field of study

Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well-and poorly-annotated genomes

Ghent University Academic Bibliography

A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Author: Aaron R. Jex
Altschul
Anja Joachim
Ashburner
Bentley
Bethony
Björnberg
Blaxter
Boag
Bronwyn E. Campbell
Caffrey
Campbell
Cantacessi
Cantacessi
Cantacessi
Cantacessi
Chan
Chang
Cinzia Cantacessi
Clifton
Conesa
Cottee
Cottee
Datu
DeRisi
Doyle
Flicek
Freigofas
Gasser
Golden
Greene
Gupta
Hawdon
Hopkins
Hotez
Hu
Huang
Hunter
Iseli
Jackson
Joachim
Joachim
Keil
Krasky
Letunic
Li
Li
Li
Lipinski
Makedonka Mitreva
Margulies
Matthew J. Nolan
McKay
Metzker
Miller
Miller
Mizuarai
Moreno
Morozova
Moser
Mufson
Mulvenna
Nagaraj
Nagaraj
Neil D. Young
Nikolaou
Nisbet
Olson
Parkinson
Paul W. Sternberg
Pong
Portman
Ranganathan
Ren
Robertson
Robin B. Gasser
Robinson
Ross S. Hall
Sahar Abubucker
Sanger
Sanger
Santos
Shoba Ranganathan
Soderlund
Stathopoulos
Stockdale
Tanaka
Vibranovski
Wang
Williamson
Wilson
Wu
Young
Young
Zhan
Zhong
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

CiteSeerX

ResearchOnline@JCU

Crossref

ResearchOnline at James Cook University

PubMed Central

Digital Commons@Becker

Caltech Authors

UGD Academic Repository

Macquarie University ResearchOnline

University of Melbourne Institutional Repository