24,927 research outputs found
Recommended from our members
A computer system to perform structure comparison using TOPS representations of protein structure
We describe the design and implementation of a fast topology–based method
for protein structure comparison. The approach uses the TOPS topological representation
of protein structure, aligning two structures using a common discovered
pattern and generating measure of distance derived from an insert score. Heavy
use is made of a constraint-based pattern matching algorithm for TOPS diagrams
that we have designed and described elsewhere Gilbert et al. (1999). The comparison
system is maintained at the European Bioinformatics Institute and is available
over the Web via the at tops.ebi.ac.uk/tops. Users submit a structure description in
Protein Data Bank (PDB) format and can compare it with structures in the entire
PDB or a representative subset of protein domains, receiving the results by email
Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints
The inapplicability of amino acid covariation methods to small protein
families has limited their use for structural annotation of whole genomes.
Recently, deep learning has shown promise in allowing accurate residue-residue
contact prediction even for shallow sequence alignments. Here we introduce
DMPfold, which uses deep learning to predict inter-atomic distance bounds, the
main chain hydrogen bond network, and torsion angles, which it uses to build
models in an iterative fashion. DMPfold produces more accurate models than two
popular methods for a test set of CASP12 domains, and works just as well for
transmembrane proteins. Applied to all Pfam domains without known structures,
confident models for 25% of these so-called dark families were produced in
under a week on a small 200 core cluster. DMPfold provides models for 16% of
human proteome UniProt entries without structures, generates accurate models
with fewer than 100 sequences in some cases, and is freely available.Comment: JGG and SMK contributed equally to the wor
MRFalign: Protein Homology Detection through Alignment of Markov Random Fields
Sequence-based protein homology detection has been extensively studied and so
far the most sensitive method is based upon comparison of protein sequence
profiles, which are derived from multiple sequence alignment (MSA) of sequence
homologs in a protein family. A sequence profile is usually represented as a
position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and
accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This
paper presents a new homology detection method MRFalign, consisting of three
key components: 1) a Markov Random Fields (MRF) representation of a protein
family; 2) a scoring function measuring similarity of two MRFs; and 3) an
efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning
two MRFs. Compared to HMM that can only model very short-range residue
correlation, MRFs can model long-range residue interaction pattern and thus,
encode information for the global 3D structure of a protein family.
Consequently, MRF-MRF comparison for remote homology detection shall be much
more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that
MRFalign outperforms several popular HMM or PSSM-based methods in terms of both
alignment accuracy and remote homology detection and that MRFalign works
particularly well for mainly beta proteins. For example, tested on the
benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM
succeed on 48% and 52% of proteins, respectively, at superfamily level, and on
15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign
succeeds on 57.3% and 42.5% of proteins at superfamily and fold level,
respectively. This study implies that long-range residue interaction patterns
are very helpful for sequence-based homology detection. The software is
available for download at http://raptorx.uchicago.edu/download/.Comment: Accepted by both RECOMB 2014 and PLOS Computational Biolog
Recommended from our members
Topology-based protein structure comparison using a pattern discovery technique
Phylogenomic analysis reveals extensive phylogenetic mosaicism in the Human GPCR Superfamily
A novel high throughput phylogenomic analysis (HTP) was applied to the rhodopsin G-protein coupled receptor (GPCR) family. Instances of phylogenetic mosaicism between receptors were found to be frequent, often as instances of correlated mosaicism and repeated mosaicism. A null data set was constructed with the same phylogenetic topology as the rhodopsin GPCRs. Comparison of the two data sets revealed that mosaicism was found in GPCRs in a higher frequency than would be expected by homoplasy or the effects of topology alone. Various evolutionary models of differential conservation, recombination and homoplasy are explored which could result in the patterns observed in this analysis. We find that the results are most consistent with frequent recombination events. A complex evolutionary history is illustrated in which it is likely frequent recombination has endowed GPCRs with new functions. The pattern of mosaicism is shown to be informative for functional prediction for orphan receptors. HTP analysis is complementary to conventional phylogenomic analyses revealing mosaicism that would not otherwise have been detectable through conventional phylogenetics
- …