Search CORE

38 research outputs found

Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction

Author: Giegerich Robert
Janssen Stefan
Schudoma Christian
Steger Gerhard
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Janssen S, Schudoma C, Steger G, Giegerich R. Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction. BMC Bioinformatics. 2011;12(1): 429.BACKGROUND:Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots, representative structures of different abstract shapes, or Boltzmann probabilities of structures and shapes. Although all programs refer to the same physical model, they implement it with considerable variation for different tasks, and little is known about the effects of heuristic assumptions and model simplifications used by the programs on the outcome of the analysis.RESULTS:We extract four different models of the thermodynamic folding space which underlie the programs RNAfold, RNAshapes, and RNAsubopt. Their differences lie within the details of the energy model and the granularity of the folding space. We implement probabilistic shape analysis for all models, and introduce the shape probability shift as a robust measure of model similarity. Using four data sets derived from experimentally solved structures, we provide a quantitative evaluation of the model differences.CONCLUSIONS:We find that search space granularity affects the computed shape probabilities less than the over- or underapproximation of free energy by a simplified energy model. Still, the approximations perform similar enough to implementations of the full model to justify their continued use in settings where computational constraints call for simpler algorithms. On the side, we observe that the rarely used level 2 shapes, which predict the complete arrangement of helices, multiloops, internal loops and bulges, include the "true" shape in a rather small number of predicted high probability shapes. This calls for an investigation of new strategies to extract high probability members from the (very large) level 2 shape space of an RNA sequence. We provide implementations of all four models, written in a declarative style that makes them easy to be modified. Based on our study, future work on thermodynamic RNA folding may make a choice of model based on our empirical data. It can take our implementations as a starting point for further program development

Crossref

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

MPG.PuRe

proGenomes3: approaching one million accurately and consistently annotated high-quality prokaryotic genomes

Author: Bork Peer
Ducarmon Quinten R
Fullam Anthony
Huerta-Cepas Jaime
Karcher Nicolai
Khedkar Supriya
Kuhn Michael
Larralde Martin
Letunic Ivica
Maistrenko Oleksandr M
Malfertheiner Lukas
Mende Daniel R
Milanese Alessio
Rodrigues Joao Frederico Matias
Sanchis-López Claudia
Schmidt Thomas S B
Schudoma Christian
Sunagawa Shinichi
Szklarczyk Damian
von Mering Christian
Zeller Georg
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/01/2023
Field of study

The interpretation of genomic, transcriptomic and other microbial 'omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/

ZORA

Sequence–structure relationships in RNA loops: establishing the basis for loop homology modeling

Author: Abraham
Berman
Berman
Bindewald
Childs
Christian Schudoma
Das
Ding
Dirk Walther
Fiser
Gardner
Gendron
Heus
Huang
Janssen
Jonikas
Jossinet
Kabsch
Leontis
Li
Lisi
Lu
Macke
Martick
Massire
Michalsky
Panchenko
Parisien
Patrick May
Pettersen
Popenda
Richardson
Sander
Sharma
Stombaugh
Sykes
Tamura
Thore
Tuerk
Viktoria Nikiforova
Zuker
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The specific function of RNA molecules frequently resides in their seemingly unstructured loop regions. We performed a systematic analysis of RNA loops extracted from experimentally determined three-dimensional structures of RNA molecules. A comprehensive loop-structure data set was created and organized into distinct clusters based on structural and sequence similarity. We detected clear evidence of the hallmark of homology present in the sequence–structure relationships in loops. Loops differing by <25% in sequence identity fold into very similar structures. Thus, our results support the application of homology modeling for RNA loop model building. We established a threshold that may guide the sequence divergence-based selection of template structures for RNA loop homology modeling. Of all possible sequences that are, under the assumption of isosteric relationships, theoretically compatible with actual sequences observed in RNA structures, only a small fraction is contained in the Rfam database of RNA sequences and classes implying that the actual RNA loop space may consist of a limited number of unique loop structures and conserved sequences. The loop-structure data sets are made available via an online database, RLooM. RLooM also offers functionalities for the modeling of RNA loop structures in support of RNA engineering and design efforts

CiteSeerX

Crossref

PubMed Central

Open Repository and Bibliography - Luxembourg

MPG.PuRe

Erratum to: Bioaccumulation in aquatic systems: methodological approaches, monitoring and assessment

Author: Buchmeier Georgia
Claus Evelyn
Duester Lars
Heininger Peter
Körner Andrea
Mayer Philipp
Paschke Albrecht
Rauert Caren
Reifferscheid Georg
Rüdel Heinz
Schlechtriem Christian
Schröter-Kermani Christa
Schudoma Dieter
Schäfer Sabine
Smedes Foppe
Steffen Dieter
Vietoris Friederike
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Online Research Database In Technology

Bioaccumulation in aquatic systems: methodological approaches, monitoring and assessment

Author: Buchmeier Georgia
Claus Evelyn
Duester Lars
Heininger Peter
Körner Andrea
Mayer Philipp
Paschke Albrecht
Rauert Caren
Reifferscheid Georg
Rüdel Heinz
Schlechtriem Christian
Schröter-Kermani Christa
Schudoma Dieter
Schäfer Sabine
Smedes Foppe
Steffen Dieter
Vietoris Friederike
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Bioaccumulation, the accumulation of a chemical in an organism relative to its level in the ambient medium, is of major environmental concern. Thus, monitoring chemical concentrations in biota are widely and increasingly used for assessing the chemical status of aquatic ecosystems. In this paper, various scientific and regulatory aspects of bioaccumulation in aquatic systems and the relevant critical issues are discussed. Monitoring chemical concentrations in biota can be used for compliance checking with regulatory directives, for identification of chemical sources or event related environmental risk assessment. Assessing bioaccumulation in the field is challenging since many factors have to be considered that can effect the accumulation of a chemical in an organism. Passive sampling can complement biota monitoring since samplers with standardised partition properties can be used over a wide temporal and geographical range. Bioaccumulation is also assessed for regulation of chemicals of environmental concern whereby mainly data from laboratory studies on fish bioaccumulation are used. Field data can, however, provide additional important information for regulators. Strategies for bioaccumulation assessment still need to be harmonised for different regulations and groups of chemicals. To create awareness for critical issues and to mutually benefit from technical expertise and scientific findings, communication between risk assessment and monitoring communities needs to be improved. Scientists can support the establishment of new monitoring programs for bioaccumulation, e.g. in the frame of the amended European Environmental Quality Standard Directive

Springer - Publisher Connector

Fraunhofer-ePrints

PubMed Central

Online Research Database In Technology

How to sequence 10,000 bacterial genomes and retain your sanity: an accessible, efficient and global approach

Author: Baker Kate
Costigan Karl
Dykes Gregory
Feasey Nicholas
Hall Neil
Heavens Darren
Hinton Jay
Kumwenda Benjamin
Lipscombe James
Low Ross
Perez-Sepulveda Blanca
Predeus Alexander
Pulford Caisey
Rowe Will
Schudoma Christian
Shearer Neil
Watkins Chris
Webster Hermione
Publication venue: 'Microbiology Society'
Publication date: 27/05/2022
Field of study

Non-typhoidal Salmonella(NTS)are typically associated with enterocolitis and linked to the industrialisation of food production. In recent years, NTS has been associated with invasive disease (iNTS disease) causing an estimated 77,000 deaths each year worldwide; 80% of mortality occurs in sub-Saharan Africa. New clades of S. Typhimurium and S. Enteritidis have been identified, which are characterised by genomic degradation, altered prophage repertoires and novel multidrug resistant plasmids. To understand how these clades are contributing to the burden and severity of iNTS disease, it is crucial to expand genome-based surveillance to cover more countries, and incorporate historical isolates to generate an evolutionary timeline of the development of iNTS. We developedand validateda robust and inexpensive method for large-scale collection and sequencing of bacterial genomes. The “10,000 Salmonella genomes” project established a worldwide research collaboration to generate information relevant to the epidemiology, drug resistance and virulence factors of Salmonellae using a whole-genome sequencing approach. By streamlining collection of isolates and developing an efficient logistics pipeline, we gathered 10,419 clinical and environmental isolates from collections in low and middle-income countries within six months. Genome sequences are now available for isolates from 51 countries/territories dating from 1949 to 2017, with ~80 % representing African and Latin-American datasets. Our method can be applied to other large sample collections that require maximisation of resources within a limited timeframe. Detailed genome analyses are in progress and it is hoped that the resulting data will contribute to public health control strategies in low and middle-income countries

LSTM Online Archive

Data management pipeline for plant phenotyping in a multisite project

Author: Alshawi
Christian Schudoma
Dinu
Dirk Walther
Fabre
Finkel
Gibson
Gollub
Harnsomburana
Heike Sprenger
Hummel
Jaiswal
Karin I. Köhl
Kattge
Kenny Billiau
Khatri
K�hl
Lancashire
Li
Marenco
Mungall
Mungall
Nadkarni
Reynolds-Haertle
Riano-Pachon
Richards
Sayers
Sherry
Smith
Smith
Smith
Washington
Yamazaki
Zimmermann
Publication venue: 'CSIRO Publishing'
Publication date: 01/01/2012
Field of study

Crossref

An accessible, efficient and global approach for the large-scale sequencing of bacterial genomes

Author: Baker Kate S
Consortium 10KSG
Costigan Karl
Dykes Gregory F
Feasey Nicholas A
Hall Neil
Heavens Darren
Hinton Jay CD
Kumwenda Benjamin
Lipscombe James
Low Ross
Perez-Sepulveda Blanca M
Predeus Alexander V
Pulford Caisey V
Rowe Will
Schudoma Christian
Shearer Neil
Watkins Chris
Webster Hermione
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2021
Field of study

We have developed an efficient and inexpensive pipeline for streamlining large-scale collection and genome sequencing of bacterial isolates. Evaluation of this method involved a worldwide research collaboration focused on the model organism Salmonella enterica, the 10KSG consortium. Following the optimization of a logistics pipeline that involved shipping isolates as thermolysates in ambient conditions, the project assembled a diverse collection of 10,419 isolates from low- and middle-income countries. The genomes were sequenced using the LITE pipeline for library construction, with a total reagent cost of less than USD$10 per genome. Our method can be applied to other large bacterial collections to underpin global collaborations

University of Liverpool Repository

Queen's University Belfast Research Portal

LSTM Online Archive

LSHTM Research Online

PubMed Central

University of East Anglia digital repository

An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations

Advances in genome sequencing and assembly technologies are generating many high-quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have generated a new wheat whole-genome shotgun sequence assembly using a combination of optimized data types and an assembly algorithm designed to deal with large and complex genomes. The new assembly represents >78% of the genome with a scaffold N50 of 88.8 kb that has a high fidelity to the input data. Our new annotation combines strand-specific Illumina RNA-seq and Pacific Biosciences (PacBio) full-length cDNAs to identify 104,091 high-confidence protein-coding genes and 10,156 noncoding RNA genes. We confirmed three known and identified one novel genome rearrangements. Our approach enables the rapid and scalable assembly of wheat genomes, the identification of structural variants, and the definition of complete gene models, all powerful resources for trait analysis and breeding of this key global crop

Crossref

University of Birmingham Research Portal

eScholarship - University of California

PuSH

University of East Anglia digital repository

Rothamsted Repository

Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins

Author: A Kreegipuu
A Remenyi
A Zien
AS Mah
B Boeckmann
BE Kemp
C Sander
CH Ding
Christian Schudoma
D Plewczynski
D Schwartz
Dirk Walther
DT Denhardt
E Nishida
F Diella
F Gnad
G Manning
J Ptacek
J Qin
JA Hanley
JC Obenauer
JD Thompson
JH Kim
JL Jimenez
Joachim Selbig
JP Vert
K Alexandros
K Niefind
KY Cheng
L Rychlewski
LA Pinna
LM Iakoucheva
LN Johnson
M Levitt
M Pirooznia
MB Yaffe
N Blom
N Blom
Pawel Durek
R Burbidge
R Linding
RW Hooft
S Kawashima
SC Bagley
SC Fan
SK Hanks
T Hunter
T Joachims
T Zhou
TD Schneider
U Reimer
V Vapnik
W Kabsch
W Weckwerth
Wolfram Weckwerth
Y Park
Y Wang
Y Xue
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites. Results We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information. Conclusion While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structure-based P-site prediction method has been made available at <url>http://phos3d.mpimp-golm.mpg.de</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

MPG.PuRe