27,535 research outputs found
An Efficient MCMC Approach to Energy Function Optimization in Protein Structure Prediction
Protein structure prediction is a critical problem linked to drug design,
mutation detection, and protein synthesis, among other applications. To this
end, evolutionary data has been used to build contact maps which are
traditionally minimized as energy functions via gradient descent based schemes
like the L-BFGS algorithm. In this paper we present what we call the
Alternating Metropolis-Hastings (AMH) algorithm, which (a) significantly
improves the performance of traditional MCMC methods, (b) is inherently
parallelizable allowing significant hardware acceleration using GPU, and (c)
can be integrated with the L-BFGS algorithm to improve its performance. The
algorithm shows an improvement in energy of found structures of 8.17% to 61.04%
(average 38.9%) over traditional MH and 0.53% to 17.75% (average 8.9%) over
traditional MH with intermittent noisy restarts, tested across 9 proteins from
recent CASP competitions. We go on to map the Alternating MH algorithm to a
GPGPU which improves sampling rate by 277x and improves simulation time to a
low energy protein prediction by 7.5x to 26.5x over CPU. We show that our
approach can be incorporated into state-of-the-art protein prediction pipelines
by applying it to both trRosetta2's energy function and the distogram component
of Alphafold1's energy function. Finally, we note that specially designed
probabilistic computers (or p-computers) can provide even better performance
than GPUs for MCMC algorithms like the one discussed here.Comment: 10 pages, 4 figure
Establishing the precise evolutionary history of a gene improves prediction of disease-causing missense mutations
PURPOSE: Predicting the phenotypic effects of mutations has become an important application in clinical genetic diagnostics. Computational tools evaluate the behavior of the variant over evolutionary time and assume that variations seen during the course of evolution are probably benign in humans. However, current tools do not take into account orthologous/paralogous relationships. Paralogs have dramatically different roles in Mendelian diseases. For example, whereas inactivating mutations in the NPC1 gene cause the neurodegenerative disorder Niemann-Pick C, inactivating mutations in its paralog NPC1L1 are not disease-causing and, moreover, are implicated in protection from coronary heart disease. METHODS: We identified major events in NPC1 evolution and revealed and compared orthologs and paralogs of the human NPC1 gene through phylogenetic and protein sequence analyses. We predicted whether an amino acid substitution affects protein function by reducing the organism’s fitness. RESULTS: Removing the paralogs and distant homologs improved the overall performance of categorizing disease-causing and benign amino acid substitutions. CONCLUSION: The results show that a thorough evolutionary analysis followed by identification of orthologs improves the accuracy in predicting disease-causing missense mutations. We anticipate that this approach will be used as a reference in the interpretation of variants in other genetic diseases as well. Genet Med 18 10, 1029–1036
Exploiting Homology Information in Nontemplate Based Prediction of Protein Structures
In this paper we describe a novel strategy for exploring the conformational space of proteins and show that this leads to better models for proteins the structure of which is not amenable to template based methods. Our strategy is based on the assumption that the energy global minimum of homologous proteins must correspond to similar conformations, while the precise profiles of their energy landscape, and consequently the positions of the local minima, are likely to be different. In line with this hypothesis, we apply a replica exchange Monte Carlo simulation protocol that, rather than using different parameters for each parallel simulation, uses the sequences of homologous proteins. We show that our results are competitive with respect to alternative methods, including those producing the best model for each of the analyzed targets in the CASP10 (10th Critical Assessment of techniques for protein Structure Prediction) experiment free modeling category
Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome
The article presents an application of Hidden Markov Models (HMMs) for
pattern recognition on genome sequences. We apply HMM for identifying genes
encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma
brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa
causative agents of sleeping sickness and several diseases in domestic and wild
animals. These parasites have a peculiar strategy to evade the host's immune
system that consists in periodically changing their predominant cellular
surface protein (VSG). The motivation for using patterns recognition methods to
identify these genes, instead of traditional homology based ones, is that the
levels of sequence identity (amino acid and DNA sequence) amongst these genes
is often below of what is considered reliable in these methods. Among pattern
recognition approaches, HMM are particularly suitable to tackle this problem
because they can handle more naturally the determination of gene edges. We
evaluate the performance of the model using different number of states in the
Markov model, as well as several performance metrics. The model is applied
using public genomic data. Our empirical results show that the VSG genes on T.
brucei can be safely identified (high sensitivity and low rate of false
positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications,
Springer. The article contains 23 pages, 4 figures, 8 tables and 51
reference
- …