Search CORE

63,533 research outputs found

Bacterial riboproteogenomics : the era of N-terminal proteoform existence revealed

Author: Fijalkowska Daria
Fijalkowski Igor
Van Damme Petra
Willems Patrick
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

With the rapid increase in the number of sequenced prokaryotic genomes, relying on automated gene annotation became a necessity. Multiple lines of evidence, however, suggest that current bacterial genome annotations may contain inconsistencies and are incomplete, even for so-called well-annotated genomes. We here discuss underexplored sources of protein diversity and new methodologies for high-throughput genome re-annotation. The expression of multiple molecular forms of proteins (proteoforms) from a single gene, particularly driven by alternative translation initiation, is gaining interest as a prominent contributor to bacterial protein diversity. In consequence, riboproteogenomic pipelines were proposed to comprehensively capture proteoform expression in prokaryotes by the complementary use of (positional) proteomics and the direct readout of translated genomic regions using ribosome profiling. To complement these discoveries, tailored strategies are required for the functional characterization of newly discovered bacterial proteoforms

Ghent University Academic Bibliography

AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system

Author: Bessières P.
Bossy R.
Bryson K.
Chaillou S.
Gibrat J.-F.
Hoebeke M.
Loux V.
Maguin E.
Nicolas P.
Penaud S.
van de Guchte M.
Publication venue
Publication date: 01/07/2006
Field of study

We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license

UCL Discovery

Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification

Author: Barton Geoffrey J.
Gould Peter D.
Hall Anthony J. W.
Knop Katarzyna
Mackinnon Katarzyna
Parker Matthew T.
Schurch Nicholas J.
Sherwood Anna V.
Simpson Gordon G.
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 14/01/2020
Field of study

Understanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length. Loss of m6A from 3’ untranslated regions is associated with decreased relative transcript abundance and defective RNA 30 end formation. A functional consequence of disrupted m6A is a lengthening of the circadian period. We conclude that nanopore direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying this approach to less well-studied species could transform our understanding of what their genomes encode

University of Dundee Online Publications

University of East Anglia digital repository

Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome

Author: Vaidyanathan P. P.
Yoon Byung-Jun
Publication venue
Publication date: 01/01/2007
Field of study

The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]

Caltech Authors

Local Binary Patterns as a Feature Descriptor in Alignment-free Visualisation of Metagenomic Data

Author: Kouchaki Samaneh
Robertson David L.
Tapinos Avraam
Tirunagari Santosh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Shotgun sequencing has facilitated the analysis of complex microbial communities. However, clustering and visualising these communities without prior taxonomic information is a major challenge. Feature descriptor methods can be utilised to extract these taxonomic relations from the data. Here, we present a novel approach consisting of local binary patterns (LBP) coupled with randomised singular value decomposition (RSVD) and Barnes-Hut t-stochastic neighbor embedding (BH-tSNE) to highlight the underlying taxonomic structure of the metagenomic data. The effectiveness of our approach is demonstrated using several simulated and a real metagenomic datasets

Enlighten

MisPred: a resource for identification of erroneous protein sequences in public databases

Author: Nagy Alinda
Patthy László
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences because it significantly affects the conclusions that may be drawn from genome-scale sequence analyses of eukaryotic genomes. Here we present the MisPred database and computational pipeline that provide efficient means for the identification of erroneous sequences in public databases. The MisPred database contains a collection of abnormal, incomplete and mispredicted protein sequences from 19 metazoan species identified as erroneous by MisPred quality control tools in the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, NCBI/RefSeq and EnsEMBL databases. Major releases of the database are automatically generated and updated regularly. The database (http://www.mispred.com) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats

PubMed Central

Repository of the Academy's Library

Molecular cytogenetic differentiation of paralogs of Hox paralogs in duplicated and re-diploidized genome of the North American paddlefish (Polyodon spathula).

Author: Amemiya Chris T
Flajšhans Martin
Gela David
Havelka Miloš
Howell William Mike
Kořínková Tereza
Ráb Petr
Symonová Radka
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

BackgroundAcipenseriformes is a basal lineage of ray-finned fishes and comprise 27 extant species of sturgeons and paddlefishes. They are characterized by several specific genomic features as broad ploidy variation, high chromosome numbers, presence of numerous microchromosomes and propensity to interspecific hybridization. The presumed palaeotetraploidy of the American paddlefish was recently validated by molecular phylogeny and Hox genes analyses. A whole genome duplication in the paddlefish lineage was estimated at approximately 42 Mya and was found to be independent from several genome duplications evidenced in its sister lineage, i.e. sturgeons. We tested the ploidy status of available chromosomal markers after the expected rediploidization. Further we tested, whether paralogs of Hox gene clusters originated from this paddlefish specific genome duplication are cytogenetically distinguishable.ResultsWe found that both paralogs HoxA alpha and beta were distinguishable without any overlapping of the hybridization signal - each on one pair of large metacentric chromosomes. Of the HoxD, only the beta paralog was unequivocally identified, whereas the alpha paralog did not work and yielded only an inconclusive diffuse signal. Chromosomal markers on three diverse ploidy levels reflecting different stages of rediploidization were identified: quadruplets retaining their ancestral tetraploid condition, semi-quadruplets still reflecting the ancestral tetraploidy with clear signs of advanced rediploidization, doublets were diploidized with ancestral tetraploidy already blurred. Also some of the available microsatellite data exhibited diploid allelic band patterns at their loci whereas another locus showed more than two alleles.ConclusionsOur exhaustive staining of paddlefish chromosomes combined with cytogenetic mapping of ribosomal genes and Hox paralogs and with microsatellite data, brings a closer look at results of the process of rediploidization in the course of paddlefish genome evolution. We show a partial rediploidization represented by a complex mosaic structure comparable with segmental paleotetraploidy revealed in sturgeons (Acipenseridae). Sturgeons and paddlefishes with their high propensity for whole genome duplication thus offer suitable animal model systems to further explore evolutionary processes that were shaping the early evolution of all vertebrates

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

FigShare

University of Innsbruck Digital Library