1,253 research outputs found

    Analysis on protein structures using statistical and bioinformatical methods

    Get PDF
    This PhD dissertation mainly focuses on the statistical analysis for the protein structure data. The first research project focuses on data mining and prediction for side chain orientation in protein structure. Through this study, we find that the general side chain orientation can be viewed as a manifestation of hydrophobic force. Along with this study, we also developed the software for visualizing the general side chain vector and applied statistical machine learning methods to fit several models for predicting general side chain orientation. In the second project, we studied the motion of partially assembled ribosome 30S subunit using the coarse-grained elastic network model. Besides our studies on ribosome motion, using 176 NMR structure ensembles, we applied principal component analysis to analyze the essential conformational changes to validate the motion generated by the elastic network model. Furthermore, we also studied the effects of different superposition methods on the correspondence between the conformational changes and the simulated motion. Principal component shaving method is often used to cluster gene with the similar gene expression pattern in micro-array data analysis. In the third research project, we applied this method to cluster the structures within a NMR structure ensemble and demonstrated that this method could be used to find the similar structure cluster

    Sequence specificity despite intrinsic disorder: How a disease-associated Val/Met polymorphism rearranges tertiary interactions in a long disordered protein

    Get PDF
    The role of electrostatic interactions and mutations that change charge states in intrinsically disordered proteins (IDPs) is well-established, but many disease-associated mutations in IDPs are charge-neutral. The Val66Met single nucleotide polymorphism (SNP) in precursor brain-derived neurotrophic factor (BDNF) is one of the earliest SNPs to be associated with neuropsychiatric disorders, and the underlying molecular mechanism is unknown. Here we report on over 250 μs of fully-atomistic, explicit solvent, temperature replica-exchange molecular dynamics (MD) simulations of the 91 residue BDNF prodomain, for both the V66 and M66 sequence. The simulations were able to correctly reproduce the location of both local and non-local secondary structure changes due to the Val66Met mutation, when compared with NMR spectroscopy. We find that the change in local structure is mediated via entropic and sequence specific effects. We developed a hierarchical sequence-based framework for analysis and conceptualization, which first identifies blobs of 4-15 residues representing local globular regions or linkers. We use this framework within a novel test for enrichment of higher-order (tertiary) structure in disordered proteins; the size and shape of each blob is extracted from MD simulation of the real protein (RP), and used to parameterize a self-avoiding heterogenous polymer (SAHP). The SAHP version of the BDNF prodomain suggested a protein segmented into three regions, with a central long, highly disordered polyampholyte linker separating two globular regions. This effective segmentation was also observed in full simulations of the RP, but the Val66Met substitution significantly increased interactions across the linker, as well as the number of participating residues. The Val66Met substitution replaces β-bridging between V66 and V94 (on either side of the linker) with specific side-chain interactions between M66 and M95. The protein backbone in the vicinity of M95 is then free to form β-bridges with residues 31-41 near the N-terminus, which condenses the protein. A significant role for Met/Met interactions is consistent with previously-observed non-local effects of the Val66Met SNP, as well as established interactions between the Met66 sequence and a Met-rich receptor that initiates neuronal growth cone retraction

    Modelling biomolecules through atomistic graphs: theory, implementation, and applications

    Get PDF
    Describing biological molecules through computational models enjoys ever-growing popularity. Never before has access to computational resources been easier for scientists across the natural sciences. The need for accurate, efficient, and robust modelling tools is therefore irrefutable. This, in turn, calls for highly interdisciplinary research, which the thesis presented here is a product of. Through the successful marriage of techniques from mathematical graph theory, theoretical insights from chemistry and biology, and the tools of modern computer science, we are able to computationally construct accurate depictions of biomolecules as atomistic graphs, in which individual atoms become nodes and chemical bonds/interactions are represented by weighted edges. When combined with methods from graph theory and network science, this approach has previously been shown to successfully reveal various properties of proteins, such as dynamics, rigidity, multi-scale organisation, allostery, and protein-protein interactions, and is well poised to set new standards in terms of computational feasibility, multi-scale resolution (from atoms to domains) and time-scales (from nanoseconds to milliseconds). Therefore, building on previous work in our research group spanning over 15 years and to further encourage and facilitate research into this growing field, this thesis's main contribution is to provide a formalised foundation for the construction of atomistic graphs. The most crucial aspect of constructing atomistic graphs of large biomolecules compared to small molecules is the necessity to include a variety of different types of bonds and interactions, because larger biomolecules attain their unique structural layout mainly through weaker interactions, e.g. hydrogen bonds, the hydrophobic effect or π-π interactions. Whilst most interaction types are well-studied and have readily available methodology which can be used to construct atomistic graphs, this is not the case for hydrophobic interactions. To fill this gap, the work presented herein includes novel methodology for encoding the hydrophobic effect in atomistic graphs, that accounts for the many-body effect and non-additivity. Then, a standalone software package for constructing atomistic graphs from structural data is presented. Herein lies the heart of this thesis: the combination of a variety of methodologies for a range of bond/interaction types, as well as an implementation that is deterministic, easy-to-use and efficient. Finally, some promising avenues for utilising atomistic graphs in combination with graph theoretical tools such as Markov Stability as well as other approaches such as Multilayer Networks to study various properties of biomolecules are presented.Open Acces

    INVESTIGATING THE EFFECTS OF IONIC LIQUIDS ON DNA GQUADRUPLEX AND PROTEIN STRUCTURE USING MOLECULAR DYNAMICS SIMULATIONS

    Get PDF
    Nucleic acids and proteins have huge implications in biomedicine and bioengineering, however their storage instability limits their applicability and current storage protocols are expensive and globally-inaccessible. Finding an alternative biocompatible media to store nucleic acids and proteins would reduce costs and increase their applicability. Ionic liquids (ILs) are molten salt compounds that have been shown to modulate the stability and activity of nucleic acids and proteins. In this thesis, molecular modeling studies of DNA/RNA and protein structure in ILs will be discussed (Chapter 1) and this method will be used to study the IL effects on the structure on the Pu22 c-MYC DNA G-quadruplex (Chapter 2) and the azurin protein (Chapter 3). ILs have been observed to stabilize/destabilize DNA G-quadruplexes linked to cancer oncogene expression, however the structural effects of imidazolium-based ILs on G-quadruplexes remain unknown. Bioengineering of azurin is attractive for soil bioremediation, thus understanding the structural changes induced by TMG amino acid-based ILs will mediate future IL design for enhancing azurin\u27s activity. In Chapter 2, molecular dynamics (MD) simulations will elucidate the stabilizing mechanism of four imidazolium-based ILs of increasing hydrophobicity to Pu22, using the G-quadruplex stabilizer TMPyP4 as a molecular probe. In Chapter 3, conventional and replica-exchange MD simulations will provide insight into the enthalpic and entropic change induced by two TMG-AA based ILs on the folded and unfolded azurin conformations

    Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction

    Get PDF
    Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction. In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity. Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results. In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied

    3D Hydrophobic Moment Vectors as a Tool to Characterize the Surface Polarity of Amphiphilic Peptides

    Get PDF
    AbstractThe interaction of membranes with peptides and proteins is largely determined by their amphiphilic character. Hydrophobic moments of helical segments are commonly derived from their two-dimensional helical wheel projections, and the same is true for β-sheets. However, to the best of our knowledge, there exists no method to describe structures in three dimensions or molecules with irregular shape. Here, we define the hydrophobic moment of a molecule as a vector in three dimensions by evaluating the surface distribution of all hydrophilic and lipophilic regions over any given shape. The electrostatic potential on the molecular surface is calculated based on the atomic point charges. The resulting hydrophobic moment vector is specific for the instantaneous conformation, and it takes into account all structural characteristics of the molecule, e.g., partial unfolding, bending, and side-chain torsion angles. Extended all-atom molecular dynamics simulations are then used to calculate the equilibrium hydrophobic moments for two antimicrobial peptides, gramicidin S and PGLa, under different conditions. We show that their effective hydrophobic moment vectors reflect the distribution of polar and nonpolar patches on the molecular surface and the calculated electrostatic surface potential. A comparison of simulations in solution and in lipid membranes shows how the peptides undergo internal conformational rearrangement upon binding to the bilayer surface. A good correlation with solid-state NMR data indicates that the hydrophobic moment vector can be used to predict the membrane binding geometry of peptides. This method is available as a web application on http://www.ibg.kit.edu/HM/

    Using Structural Bioinformatics to Model and Design Membrane Proteins

    Get PDF
    Cells require membrane proteins for a wide spectrum of critical functions. Transmembrane proteins enable cells to communicate with its environment, catalysis, ion transport and scaffolding. The functional roles of membrane proteins are specified by their sequence composition and precise three dimensional folding. The exact mechanisms driving folding of membrane proteins is still not fully understood. Further, the association between membrane proteins occurs with pinpoint specificity. For example, there exists common sequence features within families of transmembrane receptors, yet there is little cross talk between families. Therefore, we ask how membrane proteins dial in their specificity and what factors are responsible for adoption of native structure. Advancements in membrane protein structure determination methods has been followed by a sharp increase in three dimensional structures. Structural bioinfomatics has been utilized effectively to study water soluble proteins. The field is now entering an era where structural bioinformatics can be applied to modeling membrane proteins without structure and engineering novel membrane proteins. The transmembrane domains of membrane proteins were first categorized structurally. From this analysis, we are able to describe the ways in which membrane proteins fold and associate. We further derived sequence profiles for the commonly occurring structural motifs, enabling us to investigate the role of amino acids within the bilayer. Utilizing these tools, a transmembrane structural model was constructed of principle cell surface receptors (integrins). The structural model enabled understanding of possible mechanisms used to signal and to propose a novel membrane protein packing motif. In addition, novel scoring functions for membrane proteins were developed and applied to modeling membrane proteins. We derived the first all-atom membrane statistical potential and introduced the usage of exposed volume. These potentials allowed modeling of complex interactions in membrane proteins, such as salt bridges. To understand the geometric preferences of salt bridges, we surveyed a structural database. We learned about large biases in salt bridge orientations that will be useful in modeling and design. Lastly, we combine these structural bioinformatic efforts, enabling us to model membrane proteins in ways which were previously inaccessible
    • …
    corecore