902 research outputs found

    Using Entropy Maximization to Understand the Determinants of Structural Dynamics beyond Native Contact Topology

    Get PDF
    Comparison of elastic network model predictions with experimental data has provided important insights on the dominant role of the network of inter-residue contacts in defining the global dynamics of proteins. Most of these studies have focused on interpreting the mean-square fluctuations of residues, or deriving the most collective, or softest, modes of motions that are known to be insensitive to structural and energetic details. However, with increasing structural data, we are in a position to perform a more critical assessment of the structure-dynamics relations in proteins, and gain a deeper understanding of the major determinants of not only the mean-square fluctuations and lowest frequency modes, but the covariance or the cross-correlations between residue fluctuations and the shapes of higher modes. A systematic study of a large set of NMR-determined proteins is analyzed using a novel method based on entropy maximization to demonstrate that the next level of refinement in the elastic network model description of proteins ought to take into consideration properties such as contact order (or sequential separation between contacting residues) and the secondary structure types of the interacting residues, whereas the types of amino acids do not play a critical role. Most importantly, an optimal description of observed cross-correlations requires the inclusion of destabilizing, as opposed to exclusively stabilizing, interactions, stipulating the functional significance of local frustration in imparting native-like dynamics. This study provides us with a deeper understanding of the structural basis of experimentally observed behavior, and opens the way to the development of more accurate models for exploring protein dynamics

    Progress in the development and application of computational methods for probabilistic protein design

    Get PDF
    Proteins exhibit a wide range of physical and chemical properties, including highly selective molecular recognition and catalysis, and are also key components in biological metabolic, catabolic, and signaling pathways. Given that proteins are well-structured and can now be rapidly synthesized, they are excellent targets for engineering of both molecular structure and biological function. Computational analysis of the protein design problem allows scientists to explore sequence space and systematically discover novel protein molecules. Nonetheless, the complexity of proteins, the subtlety of the determinants of folding, and the exponentially large number of possible sequences impede the search for peptide sequences compatible with a desired structure and function. Directed search algorithms, which identify directly a small number of sequences, have achieved some success in identifying sequences with desired structures and functions. Alternatively, one can adopt a probabilistic approach. Instead of a finite number of sequences, such calculations result in a probabilistic description of the sequence ensemble. In particular, by casting the formalism in the language of statistical mechanics, the site-specific amino acid probabilities of sequences compatible with a target structure may be readily identified. The computational probabilities are well suited for both de novo protein design of particular sequences as well as combinatorial, library-based protein engineering. The computed site-specific amino acid profile may be converted to a nucleotide base distribution to allow assembly of a partially randomized gene library. The ability to synthesize readily such degenerate oligonucleotide sequences according to the prescribed distribution is key to constructing a biased peptide library genuinely reflective of the computational design. Herein we illustrate how a standard DNA synthesizer can be used with only a slight modification to the synthesis protocol to generate a pool of degenerate DNA sequences, which encodes a predetermined amino acid distribution with high fidelity

    Rational Design of Small-Molecule Inhibitors of Protein-Protein Interactions: Application to the Oncogenic c-Myc/Max Interaction

    Get PDF
    Protein-protein interactions (PPIs) constitute an emerging class of targets for pharmaceutical intervention pursued by both industry and academia. Despite their fundamental role in many biological processes and diseases such as cancer, PPIs are still largely underrepresented in today's drug discovery. This dissertation describes novel computational approaches developed to facilitate the discovery/design of small-molecule inhibitors of PPIs, using the oncogenic c-Myc/Max interaction as a case study.First, we critically review current approaches and limitations to the discovery of small-molecule inhibitors of PPIs and we provide examples from the literature.Second, we examine the role of protein flexibility in molecular recognition and binding, and we review recent advances in the application of Elastic Network Models (ENMs) to modeling the global conformational changes of proteins observed upon ligand binding. The agreement between predicted soft modes of motions and structural changes experimentally observed upon ligand binding supports the view that ligand binding is facilitated, if not enabled, by the intrinsic (pre-existing) motions thermally accessible to the protein in the unliganded form.Third, we develop a new method for generating models of the bioactive conformations of molecules in the absence of protein structure, by identifying a set of conformations (from different molecules) that are most mutually similar in terms of both their shape and chemical features. We show how to solve the problem using an Integer Linear Programming formulation of the maximum-edge weight clique problem. In addition, we present the application of the method to known c-Myc/Max inhibitors.Fourth, we propose an innovative methodology for molecular mimicry design. We show how the structure of the c-Myc/Max complex was exploited to designing compounds that mimic the binding interactions that Max makes with the leucine zipper domain of c-Myc.In summary, the approaches described in this dissertation constitute important contributions to the fields of computational biology and computer-aided drug discovery, which combine biophysical insights and computational methods to expedite the discovery of novel inhibitors of PPIs

    Effective harmonic potentials: insights into the internal cooperativity and sequence-specificity of protein dynamics

    Get PDF
    The proper biological functioning of proteins often relies on the occurrence of coordinated fluctuations around their native structure, or of wider and sometimes highly elaborated motions. Coarse-grained elastic-network descriptions are known to capture essential aspects of conformational dynamics in proteins, but have so far remained mostly phenomenological, and unable to account for the chemical specificities of amino acids. Here, we propose a method to derive residue- and distance-specific effective harmonic potentials from the statistical analysis of an extensive dataset of NMR conformational ensembles. These potentials constitute dynamical counterparts to the mean-force statistical potentials commonly used for static analyses of protein structures. In the context of the elastic network model, they yield a strongly improved description of the cooperative aspects of residue motions, and give the opportunity to systematically explore the influence of sequence details on protein dynamics.Comment: 10 pages, 5 figures, 1 table ; Supplementary Material (11 pages, 7 figures, 1 table) ; 4 Supplementary tables as plain text file

    An introduction to the maximum entropy approach and its application to inference problems in biology

    Get PDF
    A cornerstone of statistical inference, the maximum entropy framework is being increasingly applied to construct descriptive and predictive models of biological systems, especially complex biological networks, from large experimental data sets. Both its broad applicability and the success it obtained in different contexts hinge upon its conceptual simplicity and mathematical soundness. Here we try to concisely review the basic elements of the maximum entropy principle, starting from the notion of ‘entropy’, and describe its usefulness for the analysis of biological systems. As examples, we focus specifically on the problem of reconstructing gene interaction networks from expression data and on recent work attempting to expand our system-level understanding of bacterial metabolism. Finally, we highlight some extensions and potential limitations of the maximum entropy approach, and point to more recent developments that are likely to play a key role in the upcoming challenges of extracting structures and information from increasingly rich, high-throughput biological data

    Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

    Get PDF
    Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

    Combining computer simulations and deep learning to understand and predict protein structural dynamics

    Get PDF
    Molecular dynamics simulations provide a means to characterize the ensemble of structures that a protein adopts in solution. These structural ensembles provide crucial information about how proteins function, and these ensembles also reveal potential drug binding sites that are not observable from static protein structures (i.e. cryptic pockets). However, analyzing these high- dimensional datasets to understand protein function remains challenging. Additionally, finding cryptic pockets using simulation data is slow and expensive, which makes the appeal of computationally screening for cryptic pockets limited to a narrow set of circumstances. In this thesis, I develop deep learning based methods to overcome these challenges. First, I develop a deep learning algorithm, called DiffNets, to deal with the high-dimensionality of structural ensembles. DiffNets takes structural ensembles from similar systems with different biochemical properties and learns to highlight structural features that distinguish the systems, ultimately connecting structural signatures to their associated biochemical properties. Using DiffNets, I provide structural insights that explain how naturally occurring genetic variants of the oxytocin receptor alter signaling. Additionally, DiffNets help reveal how a SARS-CoV-2 protein involved in immune evasion becomes activated. Next, I use MD simulations to hunt for cryptic pockets across the SARS-CoV-2 proteome, which led to the discovery of more than 50 new potential druggable sites. Because this effort required an extraordinary amount of resources, I developed a deep learning approach to predict sites of cryptic pockets from single protein structures. This approach reduces the time to identify if a protein has a cryptic pocket by ~10,000-fold compared to the next best method

    AN EDGE-CENTRIC PERSPECTIVE FOR BRAIN NETWORK COMMUNITIES

    Get PDF
    Thesis (Ph.D.) - Indiana University, Department of Psychological and Brain Sciences and Program in Neuroscience, 2021The brain is a complex system organized on multiple scales and operating in both a local and distributed manner. Individual neurons and brain regions participate in specific functions, while at the same time existing in the context of a larger network, supporting a range of different functionalities. Building brain networks comprised of distinct neural elements (nodes) and their interrelationships (edges), allows us to model the brain from both local and global perspectives, and to deploy a wide array of computational network tools. A popular network analysis approach is community detection, which aims to subdivide a network’s nodes into clusters that can used to represent and evaluate network organization. Prevailing community detection approaches applied to brain networks are designed to find densely interconnected sets of nodes, leading to the notion that the brain is organized in an exclusively modular manner. Furthermore, many brain network analyses tend to focus on the nodes, evidenced by the search for modular groupings of neural elements that might serve a common function. In this thesis, we describe the application of community detection algorithms that are sensitive to alternative cluster configurations, enhancing our understanding of brain network organization. We apply a framework called the stochastic block model, which we use to uncover evidence of non-modular organization in human anatomical brain networks across the life span, and in the informatically-collated rat cerebral cortex. We also propose a framework to cluster functional brain network edges in human data, which naturally results in an overlapping organization at the level of nodes that bridges canonical functional systems. These alternative methods utilize the connection patterns of brain network edges in ways that prevailing approaches do not. Thus, we motivate an alternative outlook which focuses on the importance of information provided by the brain’s interconnections, or edges. We call this an edge-centric perspective. The edge-centric approaches developed here offer new ways to characterize distributed brain organization and contribute to a fundamental change in perspective in our thinking about the brain
    • …
    corecore