73,465 research outputs found
Distance-based Protein Folding Powered by Deep Learning
Contact-assisted protein folding has made very good progress, but two
challenges remain. One is accurate contact prediction for proteins lack of many
sequence homologs and the other is that time-consuming folding simulation is
often needed to predict good 3D models from predicted contacts. We show that
protein distance matrix can be predicted well by deep learning and then
directly used to construct 3D models without folding simulation at all. Using
distance geometry to construct 3D models from our predicted distance matrices,
we successfully folded 21 of the 37 CASP12 hard targets with a median family
size of 58 effective sequence homologs within 4 hours on a Linux computer of 20
CPUs. In contrast, contacts predicted by direct coupling analysis (DCA) cannot
fold any of them in the absence of folding simulation and the best CASP12 group
folded 11 of them by integrating predicted contacts into complex,
fragment-based folding simulation. The rigorous experimental validation on 15
CASP13 targets show that among the 3 hardest targets of new fold our
distance-based folding servers successfully folded 2 large ones with <150
sequence homologs while the other servers failed on all three, and that our ab
initio folding server also predicted the best, high-quality 3D model for a
large homology modeling target. Further experimental validation in CAMEO shows
that our ab initio folding server predicted correct fold for a membrane protein
of new fold with 200 residues and 229 sequence homologs while all the other
servers failed. These results imply that deep learning offers an efficient and
accurate solution for ab initio folding on a personal computer
Isolation, characterisation and expression patterns of a RAD51 ortholog from Pleurotus ostreatus
AB: Using degenerated primers for conserved regions of RecA homologs we have isolated a gene from Pleurotus ostreatus that shows characteristic features of RAD51 homologs. The encoded amino acid sequence of P. ostreatus RAD51 (PoRAD51) shows greatest sequence similarities with RAD51 from Coprinus cinereus (89% identity). Furthermore the genomic organisation of PoRAD51 is almost identical to that of RAD51 from C. cinereus. Northern analysis shows that the expression of PoRAD51 is found in vegetative mycelium, and fruit body tissue, and that it is expressed at elevated levels in lamellae/basidia and following DNA damage. A sporulation deficient mutant strain of P. ostreatus (ATTC 58937) showed expression patterns of the RAD51 gene that are similar those of the normal sporulating strain
An Alternative Model of Amino Acid Replacement
The observed correlations between pairs of homologous protein sequences are
typically explained in terms of a Markovian dynamic of amino acid substitution.
This model assumes that every location on the protein sequence has the same
background distribution of amino acids, an assumption that is incompatible with
the observed heterogeneity of protein amino acid profiles and with the success
of profile multiple sequence alignment. We propose an alternative model of
amino acid replacement during protein evolution based upon the assumption that
the variation of the amino acid background distribution from one residue to the
next is sufficient to explain the observed sequence correlations of homologs.
The resulting dynamical model of independent replacements drawn from
heterogeneous backgrounds is simple and consistent, and provides a unified
homology match score for sequence-sequence, sequence-profile and
profile-profile alignment.Comment: Minor improvements. Added figure and reference
Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.
BackgroundOne of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.ResultsWe explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.ConclusionWhen attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used
HorA web server to infer homology between proteins using sequence and structural similarity
The biological properties of proteins are often gleaned through comparative analysis of evolutionary relatives. Although protein structure similarity search methods detect more distant homologs than purely sequence-based methods, structural resemblance can result from either homology (common ancestry) or analogy (similarity without common ancestry). While many existing web servers detect structural neighbors, they do not explicitly address the question of homology versus analogy. Here, we present a web server named HorA (Homology or Analogy) that identifies likely homologs for a query protein structure. Unlike other servers, HorA combines sequence information from state-of-the-art profile methods with structure information from spatial similarity measures using an advanced computational technique. HorA aims to identify biologically meaningful connections rather than purely 3D-geometric similarities. The HorA method finds ∼90% of remote homologs defined in the manually curated database SCOP. HorA will be especially useful for finding remote homologs that might be overlooked by other sequence or structural similarity search servers. The HorA server is available at http://prodata.swmed.edu/horaserver
Towards Alignment Independent Quantitative Assessment of Homology Detection
Identification of homologous proteins provides a basis for protein annotation. Sequence alignment tools reliably identify homologs sharing high sequence similarity. However, identification of homologs that share low sequence similarity remains a challenge. Lowering the cutoff value could enable the identification of diverged homologs, but also introduces numerous false hits. Methods are being continuously developed to minimize this problem. Estimation of the fraction of homologs in a set of protein alignments can help in the assessment and development of such methods, and provides the users with intuitive quantitative assessment of protein alignment results. Herein, we present a computational approach that estimates the amount of homologs in a set of protein pairs. The method requires a prevalent and detectable protein feature that is conserved between homologs. By analyzing the feature prevalence in a set of pairwise protein alignments, the method can estimate the number of homolog pairs in the set independently of the alignments' quality. Using the HomoloGene database as a standard of truth, we implemented this approach in a proteome-wide analysis. The results revealed that this approach, which is independent of the alignments themselves, works well for estimating the number of homologous proteins in a wide range of homology values. In summary, the presented method can accompany homology searches and method development, provides validation to search results, and allows tuning of tools and methods
Emergence of Protein Fold Families through Rational Design
Diverse proteins with similar structures are grouped into families of homologs and analogs, if their sequence similarity is higher or lower, respectively, than 20%–30%. It was suggested that protein homologs and analogs originate from a common ancestor and diverge in their distinct evolutionary time scales, emerging as a consequence of the physical properties of the protein sequence space. Although a number of studies have determined key signatures of protein family organization, the sequence-structure factors that differentiate the two evolution-related protein families remain unknown. Here, we stipulate that subtle structural changes, which appear due to accumulating mutations in the homologous families, lead to distinct packing of the protein core and, thus, novel compositions of core residues. The latter process leads to the formation of distinct families of homologs. We propose that such differentiation results in the formation of analogous families. To test our postulate, we developed a molecular modeling and design toolkit, Medusa, to computationally design protein sequences that correspond to the same fold family. We find that analogous proteins emerge when a backbone structure deviates only 1–2 Å root-mean-square deviation from the original structure. For close homologs, core residues are highly conserved. However, when the overall sequence similarity drops to ~25%–30%, the composition of core residues starts to diverge, thereby forming novel families of protein homologs. This direct observation of the formation of protein homologs within a specific fold family supports our hypothesis. The conservation of amino acids in designed sequences recapitulates that of the naturally occurring sequences, thereby validating our computational design methodology
Survey of Human Mitochondrial Diseases Using New Genomic/Proteomic Tools
BACKGROUND. We have constructed Bayesian prior-based, amino-acid sequence profiles for the complete yeast mitochondrial proteome and used them to develop methods for identifying and characterizing the context of protein mutations that give rise to human mitochondrial diseases. (Bayesian priors are conditional probabilities that allow the estimation of the likelihood of an event - such as an amino-acid substitution - on the basis of prior occurrences of similar events.) Because these profiles can assemble sets of taxonomically very diverse homologs, they enable identification of the structurally and/or functionally most critical sites in the proteins on the basis of the degree of sequence conservation. These profiles can also find distant homologs with determined three-dimensional structures that aid in the interpretation of effects of missense mutations. RESULTS. This survey reports such an analysis for 15 missense mutations one insertion and three deletions involved in Leber's hereditary optic neuropathy, Leigh syndrome, mitochondrial neurogastrointestinal encephalomyopathy, Mohr-Tranebjaerg syndrome, iron-storage disorders related to Friedreich's ataxia, and hereditary spastic paraplegia. We present structural correlations for seven of the mutations. CONCLUSIONS. Of the 19 mutations analyzed, 14 involved changes in very highly conserved parts of the affected proteins. Five out of seven structural correlations provided reasonable explanations for the malfunctions. As additional genetic and structural data become available, this methodology can be extended. It has the potential for assisting in identifying new disease-related genes. Furthermore, profiles with structural homologs can generate mechanistic hypotheses concerning the underlying biochemical processes - and why they break down as a result of the mutations.United States Department of Energy (DE-FG02-98ER62558); National Science Foundation (DBI-9807993
- …