3,572 research outputs found

    Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

    Get PDF
    BackgroundOne of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.ResultsWe explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.ConclusionWhen attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used

    Capturing coevolutionary signals in repeat proteins

    Get PDF
    The analysis of correlations of amino acid occurrences in globular proteins has led to the development of statistical tools that can identify native contacts -- portions of the chains that come to close distance in folded structural ensembles. Here we introduce a statistical coupling analysis for repeat proteins -- natural systems for which the identification of domains remains challenging. We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias reveals true co-evolutionary signals from which local native-contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families. The overall procedure can be used to reconstruct the interactions at long distances, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric

    The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

    Get PDF
    The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43 229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616 470 domain sequences classified into 23 876 sequence families. This results in the significant expansion of the CATHHMMmodel library to include models built from the CATH sequence relatives, giving a10%increase in coveragefor detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned

    Comprehensively Surveying Structure and Function of RING Domains from Drosophila melanogaster

    Get PDF
    Using a complete set of RING domains from Drosophila melanogaster, all the solved RING domains and cocrystal structures of RING-containing ubiquitin-ligases (RING-E3) and ubiquitin-conjugating enzyme (E2) pairs, we analyzed RING domains structures from their primary to quarternary structures. The results showed that: i) putative orthologs of RING domains between Drosophila melanogaster and the human largely occur (118/139, 84.9%); ii) of the 118 orthologous pairs from Drosophila melanogaster and the human, 117 pairs (117/118, 99.2%) were found to retain entirely uniform domain architectures, only Iap2/Diap2 experienced evolutionary expansion of domain architecture; iii) 4 evolutionary structurally conserved regions (SCRs) are responsible for homologous folding of RING domains at the superfamily level; iv) besides the conserved Cys/His chelating zinc ions, 6 equivalent residues (4 hydrophobic and 2 polar residues) in the SCRs possess good-consensus and conservation- these 4 SCRs function in the structural positioning of 6 equivalent residues as determinants for RING-E3 catalysis; v) members of these RING proteins located nucleus, multiple subcellular compartments, membrane protein and mitochondrion are respectively 42 (42/139, 30.2%), 71 (71/139, 51.1%), 22 (22/139, 15.8%) and 4 (4/139, 2.9%); vi) CG15104 (Topors) and CG1134 (Mul1) in C3HC4, and CG3929 (Deltex) in C3H2C3 seem to display broader E2s binding profiles than other RING-E3s; vii) analyzing intermolecular interfaces of E2/RING-E3 complexes indicate that residues directly interacting with E2s are all from the SCRs in RING domains. Of the 6 residues, 2 hydrophobic ones contribute to constructing the conserved hydrophobic core, while the 2 hydrophobic and 2 polar residues directly participate in E2/RING-E3 interactions. Based on sequence and structural data, SCRs, conserved equivalent residues and features of intermolecular interfaces were extracted, highlighting the presence of a nucleus for RING domain fold and formation of catalytic core in which related residues and regions exhibit preferential evolutionary conservation

    Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny

    Get PDF
    Supplementary sections 1–13, tables S1–S10, and figures S1–S9 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.National Science Foundation (U.S.) (Graduate Research Fellowship)National Science Foundation (U.S.) (CAREER award NSF 0644282

    Evolutionary descent of prion genes from a ZIP metal ion transport ancestor

    Get PDF
    In the more than 20 years since its discovery, both the phylogenetic origin and cellular function of the prion protein (PrP) have remained enigmatic. Insights into the function of PrP may be obtained through a characterization of its molecular neighborhood. Quantitative interactome data revealed the spatial proximity of a subset of metal ion transporters of the ZIP family to mammalian prion proteins. A subsequent bioinformatic analysis revealed the presence of a prion-like protein sequence within the N-terminal, extracellular domain of a phylogenetic branch of ZIPs. Additional structural threading and ortholog sequence alignment analyses consolidated the conclusion that the prion protein gene family is phylogenetically derived from a ZIP-like ancestor molecule. Our data explain structural and functional features found within mammalian prion proteins as elements of an ancient involvement in the transmembrane transport of divalent cations. The connection to ZIP proteins is expected to open new avenues to elucidate the biology of the prion protein in health and disease

    Efficient protein alignment algorithm for protein search

    Get PDF
    Β© 2010 Lu et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution Licens
    • …
    corecore