114 research outputs found

    Pairwise and higher-order correlations among drug-resistance mutations in HIV-1 subtype B protease

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The reaction of HIV protease to inhibitor therapy is characterized by the emergence of complex mutational patterns which confer drug resistance. The response of HIV protease to drugs often involves both primary mutations that directly inhibit the action of the drug, and a host of accessory resistance mutations that may occur far from the active site but may contribute to restoring the fitness or stability of the enzyme. Here we develop a probabilistic approach based on connected information that allows us to study residue, pair level and higher-order correlations within the same framework.</p> <p>Results</p> <p>We apply our methodology to a database of approximately 13,000 sequences which have been annotated by the treatment history of the patients from which the samples were obtained. We show that including pair interactions is essential for agreement with the mutational data, since neglect of these interactions results in order-of-magnitude errors in the probabilities of the simultaneous occurence of many mutations. The magnitude of these pair correlations changes dramatically between sequences obtained from patients that were or were not exposed to drugs. Higher-order effects make a contribution of as much as 10% for residues taken three at a time, but increase to more than twice that for 10 to 15-residue groups. The sequence data is insufficient to determine the higher-order effects for larger groups. We find that higher-order interactions have a significant effect on the predicted frequencies of sequences with large numbers of mutations. While relatively rare, such sequences are more prevalent after multi-drug therapy. The relative importance of these higher-order interactions increases with the number of drugs the patient had been exposed to.</p> <p>Conclusion</p> <p>Correlations are critical for the understanding of mutation patterns in HIV protease. Pair interactions have substantial qualitative effects, while higher-order interactions are individually smaller but may have a collective effect. Together they lead to correlations which could have an important impact on the dynamics of the evolution of cross-resistance, by allowing the virus to pass through otherwise unlikely mutational states. These findings also indicate that pairwise and possibly higher-order effects should be included in the models of protein evolution, instead of assuming that all residues mutate independently of one another.</p

    Diffusive hidden Markov model characterization of DNA looping dynamics in tethered particle experiments

    Get PDF
    In many biochemical processes, proteins bound to DNA at distant sites are brought into close proximity by loops in the underlying DNA. For example, the function of some gene-regulatory proteins depends on such DNA looping interactions. We present a new technique for characterizing the kinetics of loop formation in vitro, as observed using the tethered particle method, and apply it to experimental data on looping induced by lambda repressor. Our method uses a modified (diffusive) hidden Markov analysis that directly incorporates the Brownian motion of the observed tethered bead. We compare looping lifetimes found with our method (which we find are consistent over a range of sampling frequencies) to those obtained via the traditional threshold-crossing analysis (which can vary depending on how the raw data are filtered in the time domain). Our method does not involve any time filtering and can detect sudden changes in looping behavior. For example, we show how our method can identify transitions between long-lived, kinetically distinct states that would otherwise be difficult to discern

    A novel Bayesian approach to quantify clinical variables and to determine their spectroscopic counterparts in 1H NMR metabonomic data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on clinically relevant estimation of lipoprotein lipids by <sup>1</sup>H NMR spectroscopy of serum.</p> <p>Results</p> <p>A Bayesian methodology, with a biochemical motivation, is presented for a real <sup>1</sup>H NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC) and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG), 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively). The modelling produced a kernel-based reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of the <sup>1</sup>H NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the IDL and LDL resonances in the <sup>1</sup>H NMR spectra.</p> <p>Conclusion</p> <p>The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics.</p

    Graphical models for inferring single molecule dynamics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The recent explosion of experimental techniques in single molecule biophysics has generated a variety of novel time series data requiring equally novel computational tools for analysis and inference. This article describes in general terms how graphical modeling may be used to learn from biophysical time series data using the variational Bayesian expectation maximization algorithm (VBEM). The discussion is illustrated by the example of single-molecule fluorescence resonance energy transfer (smFRET)<it> versus</it> time data, where the smFRET time series is modeled as a hidden Markov model (HMM) with Gaussian observables. A detailed description of smFRET is provided as well.</p> <p>Results</p> <p>The VBEM algorithm returns the model’s evidence and an approximating posterior parameter distribution given the data. The former provides a metric for model selection via maximum evidence (ME), and the latter a description of the model’s parameters learned from the data. ME/VBEM provide several advantages over the more commonly used approach of maximum likelihood (ML) optimized by the expectation maximization (EM) algorithm, the most important being a natural form of model selection and a well-posed (non-divergent) optimization problem.</p> <p>Conclusions</p> <p>The results demonstrate the utility of graphical modeling for inference of dynamic processes in single molecule biophysics.</p

    Modular protein-RNA interactions regulating mRNA metabolism: a role for NMR

    Get PDF
    Here we review the role played by transient interactions between multi-functional proteins and their RNA targets in the regulation of mRNA metabolism, and we describe the important function of NMR spectroscopy in the study of these systems. We place emphasis on a general approach for the study of different features of modular multi-domain recognition that uses well-established NMR techniques and that has provided important advances in the general understanding of post-transcriptional regulation

    Probabilistic Interaction Network of Evidence Algorithm and its Application to Complete Labeling of Peak Lists from Protein NMR Spectroscopy

    Get PDF
    The process of assigning a finite set of tags or labels to a collection of observations, subject to side conditions, is notable for its computational complexity. This labeling paradigm is of theoretical and practical relevance to a wide range of biological applications, including the analysis of data from DNA microarrays, metabolomics experiments, and biomolecular nuclear magnetic resonance (NMR) spectroscopy. We present a novel algorithm, called Probabilistic Interaction Network of Evidence (PINE), that achieves robust, unsupervised probabilistic labeling of data. The computational core of PINE uses estimates of evidence derived from empirical distributions of previously observed data, along with consistency measures, to drive a fictitious system M with Hamiltonian H to a quasi-stationary state that produces probabilistic label assignments for relevant subsets of the data. We demonstrate the successful application of PINE to a key task in protein NMR spectroscopy: that of converting peak lists extracted from various NMR experiments into assignments associated with probabilities for their correctness. This application, called PINE-NMR, is available from a freely accessible computer server (http://pine.nmrfam.wisc.edu). The PINE-NMR server accepts as input the sequence of the protein plus user-specified combinations of data corresponding to an extensive list of NMR experiments; it provides as output a probabilistic assignment of NMR signals (chemical shifts) to sequence-specific backbone and aliphatic side chain atoms plus a probabilistic determination of the protein secondary structure. PINE-NMR can accommodate prior information about assignments or stable isotope labeling schemes. As part of the analysis, PINE-NMR identifies, verifies, and rectifies problems related to chemical shift referencing or erroneous input data. PINE-NMR achieves robust and consistent results that have been shown to be effective in subsequent steps of NMR structure determination

    Modeling Conformational Ensembles of Slow Functional Motions in Pin1-WW

    Get PDF
    Protein-protein interactions are often mediated by flexible loops that experience conformational dynamics on the microsecond to millisecond time scales. NMR relaxation studies can map these dynamics. However, defining the network of inter-converting conformers that underlie the relaxation data remains generally challenging. Here, we combine NMR relaxation experiments with simulation to visualize networks of inter-converting conformers. We demonstrate our approach with the apo Pin1-WW domain, for which NMR has revealed conformational dynamics of a flexible loop in the millisecond range. We sample and cluster the free energy landscape using Markov State Models (MSM) with major and minor exchange states with high correlation with the NMR relaxation data and low NOE violations. These MSM are hierarchical ensembles of slowly interconverting, metastable macrostates and rapidly interconverting microstates. We found a low population state that consists primarily of holo-like conformations and is a β€œhub” visited by most pathways between macrostates. These results suggest that conformational equilibria between holo-like and alternative conformers pre-exist in the intrinsic dynamics of apo Pin1-WW. Analysis using MutInf, a mutual information method for quantifying correlated motions, reveals that WW dynamics not only play a role in substrate recognition, but also may help couple the substrate binding site on the WW domain to the one on the catalytic domain. Our work represents an important step towards building networks of inter-converting conformational states and is generally applicable

    Calculation of the Free Energy and Cooperativity of Protein Folding

    Get PDF
    Calculation of the free energy of protein folding and delineation of its pre-organization are of foremost importance for understanding, predicting and designing biological macromolecules. Here, we introduce an energy smoothing variant of parallel tempering replica exchange Monte Carlo (REMS) that allows for efficient configurational sampling of flexible solutes under the conditions of molecular hydration. Its usage to calculate the thermal stability of a model globular protein, Trp cage TC5b, achieves excellent agreement with experimental measurements. We find that the stability of TC5b is attained through the coupled formation of local and non-local interactions. Remarkably, many of these structures persist at high temperature, concomitant with the origin of native-like configurations and mesostates in an otherwise macroscopically disordered unfolded state. Graph manifold learning reveals that the conversion of these mesostates to the native state is structurally heterogeneous, and that the cooperativity of their formation is encoded largely by the unfolded state ensemble. In all, these studies establish the extent of thermodynamic and structural pre-organization of folding of this model globular protein, and achieve the calculation of macromolecular stability ab initio, as required for ab initio structure prediction, genome annotation, and drug design

    Single Molecule Analysis Research Tool (SMART): An Integrated Approach for Analyzing Single Molecule Data

    Get PDF
    Single molecule studies have expanded rapidly over the past decade and have the ability to provide an unprecedented level of understanding of biological systems. A common challenge upon introduction of novel, data-rich approaches is the management, processing, and analysis of the complex data sets that are generated. We provide a standardized approach for analyzing these data in the freely available software package SMART: Single Molecule Analysis Research Tool. SMART provides a format for organizing and easily accessing single molecule data, a general hidden Markov modeling algorithm for fitting an array of possible models specified by the user, a standardized data structure and graphical user interfaces to streamline the analysis and visualization of data. This approach guides experimental design, facilitating acquisition of the maximal information from single molecule experiments. SMART also provides a standardized format to allow dissemination of single molecule data and transparency in the analysis of reported data
    • …
    corecore