366 research outputs found

    Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search

    Get PDF
    We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids

    SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants

    Get PDF
    Single nucleotide variants (SNVs) are, together with copy number variation, the primary source of variation in the human genome and are associated with phenotypic variation such as altered response to drug treatment and susceptibility to disease. Linking structural effects of non-synonymous SNVs to functional outcomes is a major issue in structural bioinformatics. The SNPeffect database (http://snpeffect.switchlab.org) uses sequence- and structure-based bioinformatics tools to predict the effect of protein-coding SNVs on the structural phenotype of proteins. It integrates aggregation prediction (TANGO), amyloid prediction (WALTZ), chaperone-binding prediction (LIMBO) and protein stability analysis (FoldX) for structural phenotyping. Additionally, SNPeffect holds information on affected catalytic sites and a number of post-translational modifications. The database contains all known human protein variants from UniProt, but users can now also submit custom protein variants for a SNPeffect analysis, including automated structure modeling. The new meta-analysis application allows plotting correlations between phenotypic features for a user-selected set of variants

    Protein Intrinsic Disorder Detection Based on Structural Features ​

    Get PDF
    openStructural Bioinformatics is a branch of science that involves the analysis of three-dimensional structures of molecules using computer science techniques. Initially, the primary focus was on proteins with a fixed three-dimensional structure. However, researchers in the last 20 years have shifted their attention to Intrinsically Disordered Proteins (IDPs), which are proteins containing disordered regions that exhibit highly heterogeneous conformations. The work in this thesis is centered around recognizing IDPs from sequences through the extraction of features that indicate protein disorder. This work presents a software tool, AlphaFold-disorder (SASA), developed by implementing PSEA and SASA algorithms. Subsequently, the quality of results produced by the new software tool was compared with state-of-the-art software. The development process involved three major procedures: the implementation of the PSEA procedure for predicting secondary structures based on three-dimensional coordinates of amino acids; the implementation of the SASA procedure for computing RSA (Relative Solvent Accessibility) of amino acids using the SASA library; and the implementation of the FoldComp procedure for managing .fcz files, which are compressed protein files. To assess the quality of the results, the dataset was initially plotted to gain insights into the distribution of features. Machine learning models were then implemented. Finally, ROC and Precision-Recall curves between AlphaFold-disorder, AlphaFold-disorder (SASA), and the best machine learning model were compared. The comparison revealed that AlphaFold-disorder (SASA) predictions are on par with AlphaFold-disorder ones, while the machine learning model requires more training data to surpass their predictions.Structural Bioinformatics is a branch of science that involves the analysis of three-dimensional structures of molecules using computer science techniques. Initially, the primary focus was on proteins with a fixed three-dimensional structure. However, researchers in the last 20 years have shifted their attention to Intrinsically Disordered Proteins (IDPs), which are proteins containing disordered regions that exhibit highly heterogeneous conformations. The work in this thesis is centered around recognizing IDPs from sequences through the extraction of features that indicate protein disorder. This work presents a software tool, AlphaFold-disorder (SASA), developed by implementing PSEA and SASA algorithms. Subsequently, the quality of results produced by the new software tool was compared with state-of-the-art software. The development process involved three major procedures: the implementation of the PSEA procedure for predicting secondary structures based on three-dimensional coordinates of amino acids; the implementation of the SASA procedure for computing RSA (Relative Solvent Accessibility) of amino acids using the SASA library; and the implementation of the FoldComp procedure for managing .fcz files, which are compressed protein files. To assess the quality of the results, the dataset was initially plotted to gain insights into the distribution of features. Machine learning models were then implemented. Finally, ROC and Precision-Recall curves between AlphaFold-disorder, AlphaFold-disorder (SASA), and the best machine learning model were compared. The comparison revealed that AlphaFold-disorder (SASA) predictions are on par with AlphaFold-disorder ones, while the machine learning model requires more training data to surpass their predictions

    Protein interface prediction using graph convolutional networks

    Get PDF
    2017 Fall.Includes bibliographical references.Proteins play a critical role in processes both within and between cells, through their interactions with each other and other molecules. Proteins interact via an interface forming a protein complex, which is difficult, expensive, and time consuming to determine experimentally, giving rise to computational approaches. These computational approaches utilize known electrochemical properties of protein amino acid residues in order to predict if they are a part of an interface or not. Prediction can occur in a partner independent fashion, where amino acid residues are considered independently of their neighbor, or in a partner specific fashion, where pairs of potentially interacting residues are considered together. Ultimately, prediction of protein interfaces can help illuminate cellular biology, improve our understanding of diseases, and aide pharmaceutical research. Interface prediction has historically been performed with a variety of methods, to include docking, template matching, and more recently, machine learning approaches. The field of machine learning has undergone a revolution of sorts with the emergence of convolutional neural networks as the leading method of choice for a wide swath of tasks. Enabled by large quantities of data and the increasing power and availability of computing resources, convolutional neural networks efficiently detect patterns in grid structured data and generate hierarchical representations that prove useful for many types of problems. This success has motivated the work presented in this thesis, which seeks to improve upon state of the art interface prediction methods by incorporating concepts from convolutional neural networks. Proteins are inherently irregular, so they don't easily conform to a grid structure, whereas a graph representation is much more natural. Various convolution operations have been proposed for graph data, each geared towards a particular application. We adapted these convolutions for use in interface prediction, and proposed two new variants. Neural networks were trained on the Docking Benchmark Dataset version 4.0 complexes and tested on the new complexes added in version 5.0. Results were compared against the state of the art method partner specific method, PAIRpred [1]. Results show that multiple variants of graph convolution outperform PAIRpred, with no method emerging as the clear winner. In the future, additional training data may be incorporated from other sources, unsupervised pretraining such as autoencoding may be employed, and a generalization of convolution to simplicial complexes may also be explored. In addition, the various graph convolution approaches may be applied to other applications with graph structured data, such as Quantitative Structure Activity Relationship (QSAR) learning, and knowledge base inference

    Structural Studies of the Anti-HIV Human Protein APOBEC3G Catalytic Domain: A Dissertation

    Get PDF
    HIV/AIDS is a disease of grave global importance with over 33 million people infected world-wide and nearly 2 million deaths each year. The rapid emergence of drug resistance, due to viral mutation, renders anti-retroviral drug candidates ineffective with alarming speed and regularity. Instead of targeting mutation prone viral proteins, an alternative approach is to target host proteins that interact with viral proteins and are critical for the HIV life-cycle. APOBEC3G is a host anti-HIV restriction factor that can exert tremendous negative pressure by hypermutating the viral genome and has the potential to be a promising candidate for anti-retroviral therapeutic research. The work presented in this thesis is focused on investigating the A3G catalytic domain structure and implications of various observed structural features for biological function. High-resolution crystal structures of the A3G catalytic domain were solved using data from macromolecular X-ray crystallographic experiments, revealing a novel intermolecular zinc coordinating motif unique to A3G. Major intermolecular interfaces observed in the crystal structure were investigated for relevance to biochemical activity and biological function. Co-crystallization with a small-molecule A3G inhibitor, discovered using high-throughput screening assays, revealed a cysteine residue near the active site that is critical for inhibition of catalytic activity by catechol moieties. The serendipitous discovery of covalent interactions between this inhibitor and a surface cysteine residue led to further biochemical experiments that revealed the other cysteine, near the active site, to be critical for inhibition. Computational modeling was used to propose a steric-hinderance based mechanism of action that was supported by mutational experiments. Structures of other human APOBEC3 homologs were modeled using in-silico methods examined for similarities and differences with A3G catalytic domain crystal structures. Comparisons based on these homology models suggest putative structural features that may endow substrate specificity and other characteristics to the APOBEC3 family members

    Computational Investigations of Biomolecular Mechanisms in Genomic Replication, Repair and Transcription

    Get PDF
    High fidelity maintenance of the genome is imperative to ensuring stability and proliferation of cells. The genetic material (DNA) of a cell faces a constant barrage of metabolic and environmental assaults throughout the its lifetime, ultimately leading to DNA damage. Left unchecked, DNA damage can result in genomic instability, inviting a cascade of mutations that initiate cancer and other aging disorders. Thus, a large area of focus has been dedicated to understanding how DNA is damaged, repaired, expressed and replicated. At the heart of these processes lie complex macromolecular dynamics coupled with intricate protein-DNA interactions. Through advanced computational techniques it has become possible to probe these mechanisms at the atomic level, providing a physical basis to describe biomolecular phenomena. To this end, we have performed studies aimed at elucidating the dynamics and interactions intrinsic to the functionality of biomolecules critical to maintaining genomic integrity: modeling the DNA editing mechanism of DNA polymerase III, uncovering the DNA damage recognition/repair mechanism of thymine DNA glycosylase and linking genetic disease to the functional dynamics of the pre-initiation complex transcription machinery. Collectively, our results elucidate the dynamic interplay between proteins and DNA, further broadening our understanding of these complex processes involved with genomic maintenance

    Computational approaches to study transcriptional regulation in the human genome

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 22-02-200

    Bioinformatic studies of small disulphide-rich proteins (SDPs)

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements

    Get PDF
    Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn Asp, Phe Tyr, Lys Arg, Gln Glu, Ile Val, Met → Leu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (\overline{R} R = 0.85) between thirty amino acid mutability scales and the mutational inertia (I X ), which measures the energetic cost weighted by the number of observations at the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. © 2017 The Author(s)
    corecore