47 research outputs found
Protein structure prediction and structure-based protein function annotation
Nature tends to modify rather than invent function of protein molecules, and the log of the modifications is encrypted in the gene sequence. Analysis of these modification events in evolutionarily related genes is important for assigning function to hypothetical genes and their products surging in databases, and to improve our understanding of the bioverse. However, random mutations occurring during evolution chisel the sequence to an extent that both decrypting these codes and identifying evolutionary relatives from sequence alone becomes difficult. Thankfully, even after many changes at the sequence level, the protein three-dimensional structures are often conserved and hence protein structural similarity usually provide more clues on evolution of functionally related proteins. In this dissertation, I study the design of three bioinformatics modules that form a new hierarchical approach for structure prediction and function annotation of proteins based on sequence-to-structure-to-function paradigm. First, we design an online platform for structure prediction of protein molecules using multiple threading alignments and iterative structural assembly simulations (I-TASSER). I review the components of this module and have added features that provide function annotation to the protein sequences and help to combine experimental and biological data for improving the structure modeling accuracy. The online service of the system has been supporting more than 20,000 biologists from over 100 countries. Next, we design a new comparative approach (COFACTOR) to identify the location of ligand binding sites on these modeled protein structures and spot the functional residue constellations using an innovative global-to-local structural alignment procedure and functional sites in known protein structures. Based on both large-scale benchmarking and blind tests (CASP), the method demonstrates significant advantages over the state-of-the- art methods of the field in recognizing ligand-binding residues for both metal and non- metal ligands. The major advantage of the method is the optimal combination of the local and global protein structural alignments, which helps to recognize functionally conserved structural motifs among proteins that have taken different evolutionary paths. We further extend the COFACTOR global-to-local approach to annotate the gene- ontology and enzyme classifications of protein molecules. Here, we added two new components to COFACTOR. First, we developed a new global structural match algorithm that allows performing better structural search. Second, a sensitive technique was proposed for constructing local 3D-signature motifs of template proteins that lack known functional sites, which allows us to perform query-template local structural similarity comparisons with all template proteins. A scoring scheme that combines the confidence score of structure prediction with global-local similarity score is used for assigning a confidence score to each of the predicted function. Large scale benchmarking shows that the predicted functions have remarkably improved precision and recall rates and also higher prediction coverage than the state-of-art sequence based methods. To explore the applicability of the method for real-world cases, we applied the method to a subset of ORFs from Chlamydia trachomatis and the functional annotations provided new testable hypothesis for improving the understanding of this phylogenetically distinct bacterium
Automated protein structure modeling in CASP9 by IâTASSER pipeline combined with QUARKâbased ab initio folding and FGâMDâbased structure refinement
IâTASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and fragmentâguided molecular dynamics (FGâMD), were added to the IâTASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the IâTASSER structure assembly simulations. FGâMD is an atomicâlevel structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogenâbonding networks, torsion angles, and steric clashes. Despite considerable progress in both the templateâbased and templateâfree structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of ÎČâproteins are still needed to further improve the IâTASSER pipeline. Proteins 2011; © 2011 WileyâLiss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/88077/1/23111_ftp.pd
Parkin, A Top Level Manager in the Cellâs Sanitation Department
Parkin belongs to a class of multiple RING domain proteins designated as RBR (RING, in between RING, RING) proteins. In this review we examine what is known regarding the structure/function relationship of the Parkin protein. Parkin contains three RING domains plus a ubiquitin-like domain and an in-between-RING (IBR) domain. RING domains are rich in cysteine amino acids that act as ligands to bind zinc ions. RING domains may interact with DNA or with other proteins and perform a wide range of functions. Some function as E3 ubiquitin ligases, participating in attachment of ubiquitin chains to signal proteasome degradation; however, ubiquitin may be attached for purposes other than proteasome degradation.
It was determined that the C-terminal most RING, RING2, is essential for Parkin to function as an E3 ubiquitin ligase and a number of substrates have been identified. However, Parkin also participates in a number of other fiunctions, such as DNA repair, microtubule stabilization, and formation of aggresomes. Some functions, such as participation in a multiprotein complex implicated in NMDA activity at the post synaptic density, do not require ubiquitination of substrate molecules. Recent observations of RING proteins suggest their function may be regulated by zinc ion binding. We have modeled the three RING domains of Parkin and have identified a new set of RING2 ligands. This set allows for binding of two rather than just one zinc ion, opening the possibility that the number of zinc ions bound acts as a molecular switch to modulate Parkin function
A Protocol for Computer-Based Protein Structure and Function Prediction
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server
The role of covalent dimerization on the physical and chemical stability of the EC1 domain of human E-cadherin
The objective of this work was to evaluate the solution stability of the EC1 domain of E-cadherin under various conditions. The EC1 domain was incubated at various temperatures (4, 37, and 70 °C) and pH values (3.0, 7.0, and 9.0). At pH 9.0 and 37 or 70 °C, a significant loss of EC1 was observed due to precipitation and a hydrolysis reaction. The degradation was suppressed upon addition of DTT, suggesting that the formation of EC1 dimer facilitated the EC1 degradation. At 4 °C and various pH values, the EC1 secondary and tertiary showed changes upon incubation up to 28 days, and DTT prevented any structural changes upon 28 days of incubation. Molecular dynamics simulations indicated that the dimer of EC1 has higher mobility than does the monomer; this higher mobility of the EC1 dimer may contribute to instability of the EC1 domain
QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs
The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by proteinâprotein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated
Community-wide assessment of GPCR structure modelling and ligand docking: GPCR Dock 2008
Recent breakthroughs in the determination of the crystal structures of G protein-coupled receptors (GPCRs) have provided new opportunities for structure-based drug design strategies targeting this protein family. With the aim of evaluating the current status of GPCR structure prediction and ligand docking, a community-wide, blind prediction assessment - GPCR Dock 2008 - was conducted in coordination with the publication of the crystal structure of the human adenosine A2Areceptor bound to the ligand ZM241385. Twenty-nine groups submitted 206 structural models before the release of the experimental structure, which were evaluated for the accuracy of the ligand binding mode and the overall receptor model compared with the crystal structure. This analysis highlights important aspects for success and future development, such as accurate modelling of structurally divergent regions and use of additional biochemical insight such as disulphide bridges in the extracellular loops
Status of GPCR modeling and docking as reflected by community-wide GPCR Dock 2010 assessment
The community-wide GPCR Dock assessment is conducted to evaluate the status of molecular modeling and ligand docking for human G protein-coupled receptors. The present round of the assessment was based on the recent structures of dopamine D3 and CXCR4 chemokine receptors bound to small molecule antagonists and CXCR4 with a synthetic cyclopeptide. Thirty-five groups submitted their receptor-ligand complex structure predictions prior to the release of the crystallographic coordinates. With closely related homology modeling templates, as for dopamine D3 receptor, and with incorporation of biochemical and QSAR data, modern computational techniques predicted complex details with accuracy approaching experimental. In contrast, CXCR4 complexes that had less-characterized interactions and only distant homology to the known GPCR structures still remained very challenging. The assessment results provide guidance for modeling and crystallographic communities in method development and target selection for further expansion of the structural coverage of the GPCR universe. © 2011 Elsevier Ltd. All rights reserved
HAAD: A Quick Algorithm for Accurate Prediction of Hydrogen Atoms in Protein Structures
Hydrogen constitutes nearly half of all atoms in proteins and their positions are essential for analyzing hydrogen-bonding interactions and refining atomic-level structures. However, most protein structures determined by experiments or computer prediction lack hydrogen coordinates. We present a new algorithm, HAAD, to predict the positions of hydrogen atoms based on the positions of heavy atoms. The algorithm is built on the basic rules of orbital hybridization followed by the optimization of steric repulsion and electrostatic interactions. We tested the algorithm using three independent data sets: ultra-high-resolution X-ray structures, structures determined by neutron diffraction, and NOE proton-proton distances. Compared with the widely used programs CHARMM and REDUCE, HAAD has a significantly higher accuracy, with the average RMSD of the predicted hydrogen atoms to the X-ray and neutron diffraction structures decreased by 26% and 11%, respectively. Furthermore, hydrogen atoms placed by HAAD have more matches with the NOE restraints and fewer clashes with heavy atoms. The average CPU cost by HAAD is 18 and 8 times lower than that of CHARMM and REDUCE, respectively. The significant advantage of HAAD in both the accuracy and the speed of the hydrogen additions should make HAAD a useful tool for the detailed study of protein structure and function. Both an executable and the source code of HAAD are freely available at http://zhang.bioinformatics.ku.edu/HAAD
Spatial, temporal, and demographic patterns in prevalence of chewing tobacco use in 204 countries and territories, 1990-2019 : a systematic analysis from the Global Burden of Disease Study 2019
Interpretation Chewing tobacco remains a substantial public health problem in several regions of the world, and predominantly in south Asia. We found little change in the prevalence of chewing tobacco use between 1990 and 2019, and that control efforts have had much larger effects on the prevalence of smoking tobacco use than on chewing tobacco use in some countries. Mitigating the health effects of chewing tobacco requires stronger regulations and policies that specifically target use of chewing tobacco, especially in countries with high prevalence. Findings In 2019, 273 center dot 9 million (95% uncertainty interval 258 center dot 5 to 290 center dot 9) people aged 15 years and older used chewing tobacco, and the global age-standardised prevalence of chewing tobacco use was 4 center dot 72% (4 center dot 46 to 5 center dot 01). 228 center dot 2 million (213 center dot 6 to 244 center dot 7; 83 center dot 29% [82 center dot 15 to 84 center dot 42]) chewing tobacco users lived in the south Asia region. Prevalence among young people aged 15-19 years was over 10% in seven locations in 2019. Although global agestandardised prevalence of smoking tobacco use decreased significantly between 1990 and 2019 (annualised rate of change: -1 center dot 21% [-1 center dot 26 to -1 center dot 16]), similar progress was not observed for chewing tobacco (0 center dot 46% [0 center dot 13 to 0 center dot 79]). Among the 12 highest prevalence countries (Bangladesh, Bhutan, Cambodia, India, Madagascar, Marshall Islands, Myanmar, Nepal, Pakistan, Palau, Sri Lanka, and Yemen), only Yemen had a significant decrease in the prevalence of chewing tobacco use, which was among males between 1990 and 2019 (-0 center dot 94% [-1 center dot 72 to -0 center dot 14]), compared with nine of 12 countries that had significant decreases in the prevalence of smoking tobacco. Among females, none of these 12 countries had significant decreases in prevalence of chewing tobacco use, whereas seven of 12 countries had a significant decrease in the prevalence of tobacco smoking use for the period. Summary Background Chewing tobacco and other types of smokeless tobacco use have had less attention from the global health community than smoked tobacco use. However, the practice is popular in many parts of the world and has been linked to several adverse health outcomes. Understanding trends in prevalence with age, over time, and by location and sex is important for policy setting and in relation to monitoring and assessing commitment to the WHO Framework Convention on Tobacco Control. Methods We estimated prevalence of chewing tobacco use as part of the Global Burden of Diseases, Injuries, and Risk Factors Study 2019 using a modelling strategy that used information on multiple types of smokeless tobacco products. We generated a time series of prevalence of chewing tobacco use among individuals aged 15 years and older from 1990 to 2019 in 204 countries and territories, including age-sex specific estimates. We also compared these trends to those of smoked tobacco over the same time period. Findings In 2019, 273 & middot;9 million (95% uncertainty interval 258 & middot;5 to 290 & middot;9) people aged 15 years and older used chewing tobacco, and the global age-standardised prevalence of chewing tobacco use was 4 & middot;72% (4 & middot;46 to 5 & middot;01). 228 & middot;2 million (213 & middot;6 to 244 & middot;7; 83 & middot;29% [82 & middot;15 to 84 & middot;42]) chewing tobacco users lived in the south Asia region. Prevalence among young people aged 15-19 years was over 10% in seven locations in 2019. Although global age standardised prevalence of smoking tobacco use decreased significantly between 1990 and 2019 (annualised rate of change: -1 & middot;21% [-1 & middot;26 to -1 & middot;16]), similar progress was not observed for chewing tobacco (0 & middot;46% [0 & middot;13 to 0 & middot;79]). Among the 12 highest prevalence countries (Bangladesh, Bhutan, Cambodia, India, Madagascar, Marshall Islands, Myanmar, Nepal, Pakistan, Palau, Sri Lanka, and Yemen), only Yemen had a significant decrease in the prevalence of chewing tobacco use, which was among males between 1990 and 2019 (-0 & middot;94% [-1 & middot;72 to -0 & middot;14]), compared with nine of 12 countries that had significant decreases in the prevalence of smoking tobacco. Among females, none of these 12 countries had significant decreases in prevalence of chewing tobacco use, whereas seven of 12 countries had a significant decrease in the prevalence of tobacco smoking use for the period. Interpretation Chewing tobacco remains a substantial public health problem in several regions of the world, and predominantly in south Asia. We found little change in the prevalence of chewing tobacco use between 1990 and 2019, and that control efforts have had much larger effects on the prevalence of smoking tobacco use than on chewing tobacco use in some countries. Mitigating the health effects of chewing tobacco requires stronger regulations and policies that specifically target use of chewing tobacco, especially in countries with high prevalence. Copyright (c) 2021 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.Peer reviewe