1,796 research outputs found

    An Alternative Model of Amino Acid Replacement

    Full text link
    The observed correlations between pairs of homologous protein sequences are typically explained in terms of a Markovian dynamic of amino acid substitution. This model assumes that every location on the protein sequence has the same background distribution of amino acids, an assumption that is incompatible with the observed heterogeneity of protein amino acid profiles and with the success of profile multiple sequence alignment. We propose an alternative model of amino acid replacement during protein evolution based upon the assumption that the variation of the amino acid background distribution from one residue to the next is sufficient to explain the observed sequence correlations of homologs. The resulting dynamical model of independent replacements drawn from heterogeneous backgrounds is simple and consistent, and provides a unified homology match score for sequence-sequence, sequence-profile and profile-profile alignment.Comment: Minor improvements. Added figure and reference

    EVolver: An optimization engine for evolving protein sequences to stabilize the respective structures

    Get PDF
    Background: Many structural bioinformatics approaches employ sequence profile-based threading techniques. To improve fold recognition rates, homology searching may include artificially evolved amino acid sequences, which were demonstrated to enhance the sensitivity of protein threading in targeting midnight zone templates. Findings. We describe implementation details of eVolver, an optimization algorithm that evolves protein sequences to stabilize the respective structures by a variety of potentials, which are compatible with those commonly used in protein threading. In a case study focusing on LARG PDZ domain, we show that artificially evolved sequences have quite high capabilities to recognize the correct protein structures using standard sequence profile-based fold recognition. Conclusions: Computationally design protein sequences can be incorporated in existing sequence profile-based threading approaches to increase their sensitivity. They also provide a desired linkage between protein structure and function in in silico experiments that relate to e.g. the completeness of protein structure space, the origin of folds and protein universe. eVolver is freely available as a user-friendly webserver and a well-documented stand-alone software distribution at http://www.brylinski.org/ evolver. © 2013 Brylinski; licensee BioMed Central Ltd

    Sequence Profile of the Parallel β Helix in the Pectate Lyase Superfamily

    Get PDF
    The parallel β helix structure found in the pectatelyasesuperfamily has been analyzed in detail. A comparative analysis of known structures has revealed a unique sequenceprofile, with a strong positional preference for specific amino acids oriented toward the interior of the parallel β helix. Using the unique sequenceprofile, search patterns have been constructed and applied to the sequence databases to identify a subset of proteins that are likely to fold into the parallel β helix. Of the 19 families identified, 39% are known to be carbohydrate-binding proteins, and 50% belong to a broad category of proteins with sequences containing leucine-rich repeats (LRRs). The most striking result is the sequence match between the search pattern and four contiguous segments of internalin A, a surface protein from the bacterial pathogenListeria monocytogenes.A plausible model of the repetitive LRR sequences of internalin A has been constructed and favorable 3D–1D profile scores have been calculated. Moreover, spectroscopic features characteristic of the parallel β helix topology in the pectate lyases are present in the circular dichroic spectrum of internalin A. Altogether, the data support the hypothesis that sequence search patterns can be used to identify proteins, including a subset of LRR proteins, that are likely to fold into the parallel β helix

    Sequence Profile of the Parallel β Helix in the Pectate Lyase Superfamily

    Get PDF
    The parallel β helix structure found in the pectatelyasesuperfamily has been analyzed in detail. A comparative analysis of known structures has revealed a unique sequenceprofile, with a strong positional preference for specific amino acids oriented toward the interior of the parallel β helix. Using the unique sequenceprofile, search patterns have been constructed and applied to the sequence databases to identify a subset of proteins that are likely to fold into the parallel β helix. Of the 19 families identified, 39% are known to be carbohydrate-binding proteins, and 50% belong to a broad category of proteins with sequences containing leucine-rich repeats (LRRs). The most striking result is the sequence match between the search pattern and four contiguous segments of internalin A, a surface protein from the bacterial pathogenListeria monocytogenes.A plausible model of the repetitive LRR sequences of internalin A has been constructed and favorable 3D–1D profile scores have been calculated. Moreover, spectroscopic features characteristic of the parallel β helix topology in the pectate lyases are present in the circular dichroic spectrum of internalin A. Altogether, the data support the hypothesis that sequence search patterns can be used to identify proteins, including a subset of LRR proteins, that are likely to fold into the parallel β helix

    SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors

    Full text link
    The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, multiple data (SIMD) vectors within each core (vectorize). By searching against the large UniProtKB/TrEMBL protein database, SWAPHI achieves a performance of up to 58.8 billion cell updates per second (GCUPS) on one coprocessor and up to 228.4 GCUPS on four coprocessors. Furthermore, it demonstrates good parallel scalability on varying number of coprocessors, and is also superior to both SWIPE on 16 high-end CPU cores and BLAST+ on 8 cores when using four coprocessors, with the maximum speedup of 1.52 and 1.86, respectively. SWAPHI is written in C++ language (with a set of SIMD intrinsics), and is freely available at http://swaphi.sourceforge.net.Comment: A short version of this paper has been accepted by the IEEE ASAP 2014 conferenc

    The Phyre2 web portal for protein modeling, prediction and analysis

    Get PDF
    Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission

    MRFalign: Protein Homology Detection through Alignment of Markov Random Fields

    Full text link
    Sequence-based protein homology detection has been extensively studied and so far the most sensitive method is based upon comparison of protein sequence profiles, which are derived from multiple sequence alignment (MSA) of sequence homologs in a protein family. A sequence profile is usually represented as a position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This paper presents a new homology detection method MRFalign, consisting of three key components: 1) a Markov Random Fields (MRF) representation of a protein family; 2) a scoring function measuring similarity of two MRFs; and 3) an efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning two MRFs. Compared to HMM that can only model very short-range residue correlation, MRFs can model long-range residue interaction pattern and thus, encode information for the global 3D structure of a protein family. Consequently, MRF-MRF comparison for remote homology detection shall be much more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that MRFalign outperforms several popular HMM or PSSM-based methods in terms of both alignment accuracy and remote homology detection and that MRFalign works particularly well for mainly beta proteins. For example, tested on the benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM succeed on 48% and 52% of proteins, respectively, at superfamily level, and on 15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign succeeds on 57.3% and 42.5% of proteins at superfamily and fold level, respectively. This study implies that long-range residue interaction patterns are very helpful for sequence-based homology detection. The software is available for download at http://raptorx.uchicago.edu/download/.Comment: Accepted by both RECOMB 2014 and PLOS Computational Biolog
    • …
    corecore