1,796 research outputs found
An Alternative Model of Amino Acid Replacement
The observed correlations between pairs of homologous protein sequences are
typically explained in terms of a Markovian dynamic of amino acid substitution.
This model assumes that every location on the protein sequence has the same
background distribution of amino acids, an assumption that is incompatible with
the observed heterogeneity of protein amino acid profiles and with the success
of profile multiple sequence alignment. We propose an alternative model of
amino acid replacement during protein evolution based upon the assumption that
the variation of the amino acid background distribution from one residue to the
next is sufficient to explain the observed sequence correlations of homologs.
The resulting dynamical model of independent replacements drawn from
heterogeneous backgrounds is simple and consistent, and provides a unified
homology match score for sequence-sequence, sequence-profile and
profile-profile alignment.Comment: Minor improvements. Added figure and reference
EVolver: An optimization engine for evolving protein sequences to stabilize the respective structures
Background: Many structural bioinformatics approaches employ sequence profile-based threading techniques. To improve fold recognition rates, homology searching may include artificially evolved amino acid sequences, which were demonstrated to enhance the sensitivity of protein threading in targeting midnight zone templates. Findings. We describe implementation details of eVolver, an optimization algorithm that evolves protein sequences to stabilize the respective structures by a variety of potentials, which are compatible with those commonly used in protein threading. In a case study focusing on LARG PDZ domain, we show that artificially evolved sequences have quite high capabilities to recognize the correct protein structures using standard sequence profile-based fold recognition. Conclusions: Computationally design protein sequences can be incorporated in existing sequence profile-based threading approaches to increase their sensitivity. They also provide a desired linkage between protein structure and function in in silico experiments that relate to e.g. the completeness of protein structure space, the origin of folds and protein universe. eVolver is freely available as a user-friendly webserver and a well-documented stand-alone software distribution at http://www.brylinski.org/ evolver. © 2013 Brylinski; licensee BioMed Central Ltd
Sequence Profile of the Parallel β Helix in the Pectate Lyase Superfamily
The parallel β helix structure found in the pectatelyasesuperfamily has been analyzed in detail. A comparative analysis of known structures has revealed a unique sequenceprofile, with a strong positional preference for specific amino acids oriented toward the interior of the parallel β helix. Using the unique sequenceprofile, search patterns have been constructed and applied to the sequence databases to identify a subset of proteins that are likely to fold into the parallel β helix. Of the 19 families identified, 39% are known to be carbohydrate-binding proteins, and 50% belong to a broad category of proteins with sequences containing leucine-rich repeats (LRRs). The most striking result is the sequence match between the search pattern and four contiguous segments of internalin A, a surface protein from the bacterial pathogenListeria monocytogenes.A plausible model of the repetitive LRR sequences of internalin A has been constructed and favorable 3D–1D profile scores have been calculated. Moreover, spectroscopic features characteristic of the parallel β helix topology in the pectate lyases are present in the circular dichroic spectrum of internalin A. Altogether, the data support the hypothesis that sequence search patterns can be used to identify proteins, including a subset of LRR proteins, that are likely to fold into the parallel β helix
Sequence Profile of the Parallel β Helix in the Pectate Lyase Superfamily
The parallel β helix structure found in the pectatelyasesuperfamily has been analyzed in detail. A comparative analysis of known structures has revealed a unique sequenceprofile, with a strong positional preference for specific amino acids oriented toward the interior of the parallel β helix. Using the unique sequenceprofile, search patterns have been constructed and applied to the sequence databases to identify a subset of proteins that are likely to fold into the parallel β helix. Of the 19 families identified, 39% are known to be carbohydrate-binding proteins, and 50% belong to a broad category of proteins with sequences containing leucine-rich repeats (LRRs). The most striking result is the sequence match between the search pattern and four contiguous segments of internalin A, a surface protein from the bacterial pathogenListeria monocytogenes.A plausible model of the repetitive LRR sequences of internalin A has been constructed and favorable 3D–1D profile scores have been calculated. Moreover, spectroscopic features characteristic of the parallel β helix topology in the pectate lyases are present in the circular dichroic spectrum of internalin A. Altogether, the data support the hypothesis that sequence search patterns can be used to identify proteins, including a subset of LRR proteins, that are likely to fold into the parallel β helix
SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors
The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its
wide use in biological sequence database search. Unfortunately, the high
sensitivity comes at the expense of quadratic time complexity, which makes the
algorithm computationally demanding for big databases. In this paper, we
present SWAPHI, the first parallelized algorithm employing Xeon Phi
coprocessors to accelerate SW protein database search. SWAPHI is designed based
on the scale-and-vectorize approach, i.e. it boosts alignment speed by
effectively utilizing both the coarse-grained parallelism from the many
co-processing cores (scale) and the fine-grained parallelism from the 512-bit
wide single instruction, multiple data (SIMD) vectors within each core
(vectorize). By searching against the large UniProtKB/TrEMBL protein database,
SWAPHI achieves a performance of up to 58.8 billion cell updates per second
(GCUPS) on one coprocessor and up to 228.4 GCUPS on four coprocessors.
Furthermore, it demonstrates good parallel scalability on varying number of
coprocessors, and is also superior to both SWIPE on 16 high-end CPU cores and
BLAST+ on 8 cores when using four coprocessors, with the maximum speedup of
1.52 and 1.86, respectively. SWAPHI is written in C++ language (with a set of
SIMD intrinsics), and is freely available at http://swaphi.sourceforge.net.Comment: A short version of this paper has been accepted by the IEEE ASAP 2014
conferenc
The Phyre2 web portal for protein modeling, prediction and analysis
Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission
MRFalign: Protein Homology Detection through Alignment of Markov Random Fields
Sequence-based protein homology detection has been extensively studied and so
far the most sensitive method is based upon comparison of protein sequence
profiles, which are derived from multiple sequence alignment (MSA) of sequence
homologs in a protein family. A sequence profile is usually represented as a
position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and
accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This
paper presents a new homology detection method MRFalign, consisting of three
key components: 1) a Markov Random Fields (MRF) representation of a protein
family; 2) a scoring function measuring similarity of two MRFs; and 3) an
efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning
two MRFs. Compared to HMM that can only model very short-range residue
correlation, MRFs can model long-range residue interaction pattern and thus,
encode information for the global 3D structure of a protein family.
Consequently, MRF-MRF comparison for remote homology detection shall be much
more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that
MRFalign outperforms several popular HMM or PSSM-based methods in terms of both
alignment accuracy and remote homology detection and that MRFalign works
particularly well for mainly beta proteins. For example, tested on the
benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM
succeed on 48% and 52% of proteins, respectively, at superfamily level, and on
15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign
succeeds on 57.3% and 42.5% of proteins at superfamily and fold level,
respectively. This study implies that long-range residue interaction patterns
are very helpful for sequence-based homology detection. The software is
available for download at http://raptorx.uchicago.edu/download/.Comment: Accepted by both RECOMB 2014 and PLOS Computational Biolog
- …