40 research outputs found
On the application of Bayesian statistics to protein structure calculation from nuclear magnetic resonance data
In the present work, we use concepts of Bayesian statistics to infer the three-dimensional structures of proteins from experimental data. We thus build upon the method of inferential structure determination (ISD) as introduced by Rieping et al. (2005). In line with their probabilistic approach, we factor the probability of a three-dimensional protein structure given the experimental data, into a prior distribution that captures the protein-likeness of a structure and the likelihood that describes how likely the experimental data were generated from a given three-dimensional structure. In this Bayesian framework, we attempt to develop structure calculation from NMR experiments into a highly accurate, objective and parameter-free process.
We start by focusing on integrating new types of data, as ISD currently does not entail a mechanism to incorporate chemical shifts in the calculation process. To alleviate this shortcoming, we propose a hidden Markov Model that captures the relationship between protein structures and chemical shifts. Based on our probabilistic model, we are able to predict the secondary structure and dihedral angles of a protein from chemical shifts.
Another means to high quality structures involves improving the potential functions that form the core of ISD’s prior distributions. Although potential functions are designed to approximate physical forces, there are still parameters, such as force constants and temperatures, that are set on an ad hoc basis and can bias the structure calculation. As an alternative, we propose an algorithm based on Bayesian model comparison to determine these parameters from the data. Further, we demonstrate that optimal data-dependent parameters lead to improved accuracy and quality of the final structure, especially with sparse and noisy data. These findings dismiss the notion of a single universal parameter and advocate the estimation of free parameters based on experimental data instead.
Third, we focus on the estimation of new potential functions to include even more prior information in the structure calculation process. Currently, only a few methods allow the estimation of potential functions from a database of known structures. Our method provides a sound mathematical solution of this problem, which is also known as the inverse problem of statistical mechanics.We demonstrate the effectiveness of our approach on the examples of simple fluids and a coarse-grained protein model.Im Rahmen dieser Arbeit stellen wir neue Ansätze, basierend auf der Bayes’schen Statistik, zur Interpretation von experimentellen Daten in der NMR-Spektroskopie vor. Dabei bauen wir auf den Ergebnissen von Rieping et al. (2005) auf, die das Prinzip der inferentiellen Strukturbestimmung (ISD) eingeführt haben. Ihr probabilistischer Ansatz beruht auf der Faktorisierung der A-posteriori-Verteilung in die A-priori-Verteilung, welche die Proteinähnlichkeit einer möglichen Struktur bewertet, und die Likelihood-Funktion, welche die Übereinstimmung mit den experimentellen Daten beschreibt. Ziel dieser Arbeit ist es, die Qualität, aber auch die Vergleichbarkeit der Strukturberechnung in der NMR- Spektroskopie zu verbessern.
Zuerst beschäftigen wir uns mit der Integration neuer experimenteller Datentypen in die Strukturrechnung. Dazu schlagen wir ein Hidden-Markov-Modell vor, das beruhend auf der chemischen Verschiebung die Dihedralwinkel und Sekundärstruktur vorhersagt.
Eine Alternative zur Integration zusätzlicher experimenteller Information ist die Verbesserung der A-priori-Verteilung. In ISD beruht die A-priori-Verteilung auf einer Po- tentialfunktion, welche die frei Energie approximiert. Dennoch gibt es freie Parameter in Potentialfunktionen, wie die Temperatur oder doe Kraftkonstante, die festgelegt werden müssen. Wir benutzen Bayes’sche Hypothesentests, um die freien Parameter objektiv und beruhend auf den experimentellen Daten zu bestimmen. Die Anwendung der Bayes’schen Hypothesentests ermöglicht es uns, verschiedene Potentialfunktionen zu kombinieren, um aus verrauschten und unvollständigen Daten noch exakte Strukturen zu bestimmen. Weiterhin zeigen unsere Studien, dass für statistische Potentiale keine allgemeingültige Kraftkonstante existiert und diese anhand der experimentellen Daten bestimmt werden sollte.
Im dritten Teil dieser Arbeit führen wir eine Methode ein, um neue Kraftfelder aus Strukturdatenbanken zu erlernen und damit die A-priori-Verteilung noch weiter zu verbessern. Dieses nichtlineare Problem ist auch als inverses Problems der statistischen Mechanik bekannt, das wir durch eine Generalisierung des Konzepts der "Configurational Temperature" lösen. Wir benutzen unsere Methode, um die Potentialfunktionen von vereinfachten Moleküldynamik Kraftfeldern zu rekonstruieren
Robust probabilistic superposition and comparison of protein structures
<p>Abstract</p> <p>Background</p> <p>Protein structure comparison is a central issue in structural bioinformatics. The standard dissimilarity measure for protein structures is the root mean square deviation (RMSD) of representative atom positions such as α-carbons. To evaluate the RMSD the structures under comparison must be superimposed optimally so as to minimize the RMSD. How to evaluate optimal fits becomes a matter of debate, if the structures contain regions which differ largely - a situation encountered in NMR ensembles and proteins undergoing large-scale conformational transitions.</p> <p>Results</p> <p>We present a probabilistic method for robust superposition and comparison of protein structures. Our method aims to identify the largest structurally invariant core. To do so, we model non-rigid displacements in protein structures with outlier-tolerant probability distributions. These distributions exhibit heavier tails than the Gaussian distribution underlying standard RMSD minimization and thus accommodate highly divergent structural regions. The drawback is that under a heavy-tailed model analytical expressions for the optimal superposition no longer exist. To circumvent this problem we work with a scale mixture representation, which implies a weighted RMSD. We develop two iterative procedures, an Expectation Maximization algorithm and a Gibbs sampler, to estimate the local weights, the optimal superposition, and the parameters of the heavy-tailed distribution. Applications demonstrate that heavy-tailed models capture differences between structures undergoing substantial conformational changes and can be used to assess the precision of NMR structures. By comparing Bayes factors we can automatically choose the most adequate model. Therefore our method is parameter-free.</p> <p>Conclusions</p> <p>Heavy-tailed distributions are well-suited to describe large-scale conformational differences in protein structures. A scale mixture representation facilitates the fitting of these distributions and enables outlier-tolerant superposition.</p
Bayesian Weighting of Statistical Potentials in NMR Structure Calculation
<div><p>The use of statistical potentials in NMR structure calculation improves the accuracy of the final structure but also raises issues of double counting and possible bias. Because statistical potentials are averaged over a large set of structures, they may not reflect the preferences of a particular structure or data set. We propose a Bayesian method to incorporate a knowledge-based backbone dihedral angle potential into an NMR structure calculation. To avoid bias exerted through the backbone potential, we adjust its weight by inferring it from the experimental data. We demonstrate that an optimally weighted potential leads to an improvement in the accuracy and quality of the final structure, especially with sparse and noisy data. Our findings suggest that no universally optimal weight exists, and that the weight should be determined based on the experimental data. Other knowledge-based potentials can be incorporated using the same approach.</p></div
Influence of the weight on the structural ensemble of Fyn-SH3 inferred with sparse NMR data.
<p>Shown are the conformations and backbone dihedral distributions generated with different . Panels A–C display structure ensembles comprising ten randomly selected conformations (grey) superimposed onto the crystal structure (red). Panels D–F show in black a maximum entropy distribution fitted to the backbone torsion angles of the structures generated with ISD. The backbone dihedral angles of the crystal structure are marked by red dots. Panels A, D show the results for , panels B, E: (optimal weight), panels C, F: (maximum weight probed during replica-exchange simulations).</p
Bayesian weighting of the backbone potential for ubiquitin inferred from high-quality distance data.
<p>A: Correlation between backbone potential and nonbonded force field. Shown is the joint distribution of physics- and knowledge-based contributions in the absence of any structural data. (The energies of the crystall structure are and .) B: Model evidence as a function of the Ramachandran weight . C: Influence of the Ramachandran weight on the average Q-factor (red dashed line) calculated for 11 RDC data sets that were not used in the structure calculation. The Q-factor reflects the agreement between experimental and calculated RDCs. The dotted black line indicates the average Q-factor of the crystal structure (PDB code: 1ubq). D: Influence of the Ramachandran weight on the fit with scalar coupling measurements (red dashed line). Six three-bond scalar coupling data sets are available for ubiquitin and have not been used in the structure calculation. The dotted black line indicates the average Q-factor of the crystal structure (PDB code: 1ubq). The grey distribution indicates the model evidence .</p
Impact of incomplete ubiquitin data on .
<p>Shown is the model evidence as a function of (grey) and the average RMSD (dots). The sparsity increases from the top left panel to the bottom right panel.</p
