37 research outputs found

    Considering scores between unrelated proteins in the search database improves profile comparison

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Profile-based comparison of multiple sequence alignments is a powerful methodology for the detection remote protein sequence similarity, which is essential for the inference and analysis of protein structure, function, and evolution. Accurate estimation of statistical significance of detected profile similarities is essential for further development of this methodology. Here we analyze a novel approach to estimate the statistical significance of profile similarity: the explicit consideration of background score distributions for each database template (subject).</p> <p>Results</p> <p>Using a simple scheme to combine and analytically approximate query- and subject-based distributions, we show that (i) inclusion of background distributions for the subjects increases the quality of homology detection; (ii) this increase is higher when the distributions are based on the scores to all known non-homologs of the subject rather than a small calibration subset of the database representatives; and (iii) these all known non-homolog distributions of scores for the subject make the dominant contribution to the improved performance: adding the calibration distribution of the query has a negligible additional effect.</p> <p>Conclusion</p> <p>The construction of distributions based on the complete sets of non-homologs for each subject is particularly relevant in the setting of structure prediction where the database consists of proteins with solved 3D structure (PDB, SCOP, CATH, etc.) and therefore structural relationships between proteins are known. These results point to a potential new direction in the development of more powerful methods for remote homology detection.</p

    Island method for estimating the statistical significance of profile-profile alignment scores

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many experiments suggest that the distribution of local profile-profile alignment scores is of the Gumbel form. However, estimating distribution parameters by random simulations turns out to be computationally very expensive.</p> <p>Results</p> <p>We demonstrate that the background distribution of profile-profile alignment scores heavily depends on profiles' composition and thus the distribution parameters must be estimated independently, for each pair of profiles of interest. We also show that accurate estimates of statistical parameters can be obtained using the "island statistics" for profile-profile alignments.</p> <p>Conclusion</p> <p>The island statistics can be generalized to profile-profile alignments to provide an efficient method for the alignment score normalization. Since multiple island scores can be extracted from a single comparison of two profiles, the island method has a clear speed advantage over the direct shuffling method for comparable accuracy in parameter estimates.</p

    Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Detection of common evolutionary origin (homology) is a primary means of inferring protein structure and function. At present, comparison of protein families represented as sequence profiles is arguably the most effective homology detection strategy. However, finding the best way to represent evolutionary information of a protein sequence family in the profile, to compare profiles and to estimate the biological significance of such comparisons, remains an active area of research.</p> <p>Results</p> <p>Here, we present a new homology detection method based on sequence profile-profile comparison. The method has a number of new features including position-dependent gap penalties and a global score system. Position-dependent gap penalties provide a more biologically relevant way to represent and align protein families as sequence profiles. The global score system enables an analytical solution of the statistical parameters needed to estimate the statistical significance of profile-profile similarities. The new method, together with other state-of-the-art profile-based methods (HHsearch, COMPASS and PSI-BLAST), is benchmarked in all-against-all comparison of a challenging set of SCOP domains that share at most 20% sequence identity. For benchmarking, we use a reference ("gold standard") free model-based evaluation framework. Evaluation results show that at the level of protein domains our method compares favorably to all other tested methods. We also provide examples of the new method outperforming structure-based similarity detection and alignment. The implementation of the new method both as a standalone software package and as a web server is available at <url>http://www.ibt.lt/bioinformatics/coma</url>.</p> <p>Conclusion</p> <p>Due to a number of developments, the new profile-profile comparison method shows an improved ability to match distantly related protein domains. Therefore, the method should be useful for annotation and homology modeling of uncharacterized proteins.</p

    Direct Reprogramming of Mouse Fibroblasts into Functional Skeletal Muscle Progenitors

    Get PDF
    Skeletal muscle harbors quiescent stem cells termed satellite cells and proliferative progenitors termed myoblasts, which play pivotal roles during muscle regeneration. However, current technology does not allow permanent capture of these cell populations in vitro. Here, we show that ectopic expression of the myogenic transcription factor MyoD, combined with exposure to small molecules, reprograms mouse fibroblasts into expandable induced myogenic progenitor cells (iMPCs). iMPCs express key skeletal muscle stem and progenitor cell markers including Pax7 and Myf5 and give rise to dystrophin-expressing myofibers upon transplantation in vivo. Notably, a subset of transplanted iMPCs maintain Pax7 expression and sustain serial regenerative responses. Similar to satellite cells, iMPCs originate from Pax7+ cells and require Pax7 itself for maintenance. Finally, we show that myogenic progenitor cell lines can be established from muscle tissue following small-molecule exposure alone. This study thus reports on a robust approach to derive expandable myogenic stem/progenitor-like cells from multiple cell types

    Efficient Identification of Critical Residues Based Only on Protein Structure by Network Analysis

    Get PDF
    Despite the increasing number of published protein structures, and the fact that each protein's function relies on its three-dimensional structure, there is limited access to automatic programs used for the identification of critical residues from the protein structure, compared with those based on protein sequence. Here we present a new algorithm based on network analysis applied exclusively on protein structures to identify critical residues. Our results show that this method identifies critical residues for protein function with high reliability and improves automatic sequence-based approaches and previous network-based approaches. The reliability of the method depends on the conformational diversity screened for the protein of interest. We have designed a web site to give access to this software at http://bis.ifc.unam.mx/jamming/. In summary, a new method is presented that relates critical residues for protein function with the most traversed residues in networks derived from protein structures. A unique feature of the method is the inclusion of the conformational diversity of proteins in the prediction, thus reproducing a basic feature of the structure/function relationship of proteins

    The Presence of the Iron-Sulfur Motif Is Important for the Conformational Stability of the Antiviral Protein, Viperin

    Get PDF
    Viperin, an antiviral protein, has been shown to contain a CX3CX2C motif, which is conserved in the radical S-adenosyl-methionine (SAM) enzyme family. A triple mutant which replaces these three cysteines with alanines has been shown to have severe deficiency in antiviral activity. Since the crystal structure of Viperin is not available, we have used a combination of computational methods including multi-template homology modeling and molecular dynamics simulation to develop a low-resolution predicted structure. The results show that Viperin is an α -β protein containing iron-sulfur cluster at the center pocket. The calculations suggest that the removal of iron-sulfur cluster would lead to collapse of the protein tertiary structure. To verify these predictions, we have prepared, expressed and purified four mutant proteins. In three mutants individual cysteine residues were replaced by alanine residues while in the fourth all the cysteines were replaced by alanines. Conformational analyses using circular dichroism and steady state fluorescence spectroscopy indicate that the mutant proteins are partially unfolded, conformationally unstable and aggregation prone. The lack of conformational stability of the mutant proteins may have direct relevance to the absence of their antiviral activity

    Of Bits and Bugs — On the Use of Bioinformatics and a Bacterial Crystal Structure to Solve a Eukaryotic Repeat-Protein Structure

    Get PDF
    Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins

    An analysis of single amino acid repeats as use case for application specific background models

    Get PDF
    Background Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. Results Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. Conclusions Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation
    corecore