Search CORE

74 research outputs found

Augmented training of hidden Markov models to recognize remote homologs via simulated evolution

Author: A. Kumar
Altschul
Chandonia
Eddy
Eddy
Edgar
Finn
Gerstein
Hughey
Hulo
Karchin
Karplus
Krogh
L. Cowen
Murzin
Pearson
Srivastava
Wistrand
Wistrand
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: While profile hidden Markov models (HMMs) are successful and powerful methods to recognize homologous proteins, they can break down when homology becomes too distant due to lack of sufficient training data. We show that we can improve the performance of HMMs in this domain by using a simple simulated model of evolution to create an augmented training set

Crossref

PubMed Central

Recognition of beta-structural motifs using hidden Markov models trained with simulated evolution

Author: A. Kumar
Bradley
Bryan
Chandonia
Cowen
Eddy
Eddy
Eddy
Edgar
Finn
Gerstein
Hughey
Hulo
Karchin
Karplus
Koehl
L. Cowen
Lifson
Liu
Lo Conte
Olmea
Rost
Wistrand
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related, has been profile hidden Markov models. However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in β-sheets. We thus explore methods for incorporating pairwise dependencies into these models

Crossref

PubMed Central

SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone

Author: B. Berger
Berman
Bradley
Cowen
Eddy
Karplus
L. J. Cowen
Lathrop
Lifson
Liu
Menke
Murzin
N. M. Daniels
Olmea
R. Hosur
Sayle
Smyth
Soding
White
Zhang
Zhang
Zhu
Publication venue: Oxford University Press
Publication date: 01/03/2012
Field of study

Motivation: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif

DSpace@MIT

Crossref

PubMed Central

A conditional neural fields model for protein threading

Author: Akutsu
Altschul
Bairoch
Bateman
Biegert
Biegert
Cozzetto
Do
Eskin
Gonnet
Haykin
Henikoff
Hildebrand
Holm
Itoh
Jaroszewski
Jian Peng
Jianzhu Ma
Jinbo Xu
Jones
Kabsch
Karplus
Kelley
Kumar
Lackner
Lafferty
Liu
Marcin
Marti Renom
McGuffin
Meng
Menke
Mott
O'Rourke
Pei
Peng
Peng
Prli
Qiu
Schönhuth
Sheng Wang
Shi
Sommer
Söding
Tan
Viterbi
Volkovs
Waldispühl
Wang
Wang
Ward
Wu
Xu
Xu
Zhang
Zhang
Zhang
Zhao
Zhou
Šali
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%)

Crossref

PubMed Central

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Author: A Ben-Hur
A Kumar
AG Murzin
AR Shah
B Liu
BJ Webb-Robertson
BJ Webb-Robertson
BJ Webb-Robertson
Bobbie-Jo M Webb-Robertson
C Leslie
Christopher S Oehmen
CS Leslie
H Rangwala
H Saigo
I Jung
I Melvin
I Melvin
J Weston
Kyle G Ratuiste
L Liao
NH Anderson
QW Dong
R Kuang
S Hochreiter
SF Altschul
SF Altschul
T Damoulas
T Lingner
TF Smith
WS Noble
WS Noble
Y Hou
Y Hou
Y Yang
Y Yuan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. Results We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost. Conclusions A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Compressive genomics for protein databases

Author: A. Gallant
B. Berger
Boratyn
Cameron
Chen
Chen
Gross
Huttenhower
J. Peng
Kahn
Kircher
Kosloff
L. J. Cowen
Loewenstein
Loh
M. Baym
McDonnell
Murzin
N. M. Daniels
Needleman
Remmert
Rost
Schatz
Soding
Tatusov
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/06/2013
Field of study

Motivation: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed-up homology search directly but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools. Results: We introduce a suite of homology search tools, powered by compressively accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate with all known state-of-the-art tools, including HHblits, DELTA-BLAST and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP’s runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants, which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed-up many tasks, such as protein structure prediction and orthology mapping, which rely heavily on homology search. Availability: CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/ Contact: [email protected]

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

Template Based Modeling and Structural Refinement of Protein-Protein Interactions.

Author: Govindarajoo Brandon
Publication venue
Publication date: 01/01/2016
Field of study

Determining protein structures from sequence is a fundamental problem in molecular biology, as protein structure is essential to understanding protein function. In this study, I developed one of the first fully automated pipelines for template based quaternary structure prediction starting from sequence. Two critical steps for template based modeling are identifying the correct homologous structures by threading which generates sequence to structure alignments and refining the initial threading template coordinates closer to the native conformation. I developed SPRING (single-chain-based prediction of interactions and geometries), a monomer threading to dimer template mapping program, which was compared to the dimer co-threading program, COTH, using 1838 non homologous target complex structures. SPRING’s similarity score outperformed COTH in the first place ranking of templates, correctly identifying 798 and 527 interfaces respectively. More importantly the results were found to be complementary and the programs could be combined in a consensus based threading program showing a 5.1% improvement compared to SPRING. Template based modeling requires a structural analog being present in the PDB. A full search of the PDB, using threading and structural alignment, revealed that only 48.7% of the PDB has a suitable template whereas only 39.4% of the PDB has templates that can be identified by threading. In order to circumvent this, I included intramolecular domain-domain interfaces into the PDB library to boost template recognition of protein dimers; the merging of the two classes of interfaces improved recognition of heterodimers by 40% using benchmark settings. Next the template based assembly of protein complexes pipeline, TACOS, was created. The pipeline combines threading templates and domain knowledge from the PDB into a knowledge based energy score. The energy score is integrated into a Monte Carlo sampling simulation that drives the initial template closer to the native topology. The full pipeline was benchmarked using 350 non homologous structures and compared to two state of the art programs for dimeric structure prediction: ZDOCK and MODELLER. On average, TACOS models global and interface structure have a better quality than the models generated by MODELLER and ZDOCK.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135847/1/bgovi_1.pd

Deep Blue Documents at the University of Michigan

Structure-based algorithms for protein-protein interaction prediction

Author: Hosur Raghavendra
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2012
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Materials Science and Engineering, 2012.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 109-124).Protein-protein interactions (PPIs) play a central role in all biological processes. Akin to the complete sequencing of genomes, complete descriptions of interactomes is a fundamental step towards a deeper understanding of biological processes, and has a vast potential to impact systems biology, genomics, molecular biology and therapeutics. PPIs are critical in maintenance of cellular integrity, metabolism, transcription/ translation, and cell-cell communication. This thesis develops new methods that significantly advance our efforts at structure- based approaches to predict PPIs and boost confidence in emerging high-throughput (HTP) data. The aims of this thesis are, 1) to utilize physicochemical properties of protein interfaces to better predict the putative interacting regions and increase coverage of PPI prediction, 2) increase confidence in HTP datasets by identifying likely experimental errors, and 3) provide residue-level information that gives us insights into structure-function relationships in PPIs. Taken together, these methods will vastly expand our understanding of macromolecular networks. In this thesis, I introduce two computational approaches for structure-based proteinprotein interaction prediction: iWRAP and Coev2Net. iWRAP is an interface threading approach that utilizes biophysical properties specific to protein interfaces to improve PPI prediction. Unlike previous structure-based approaches that use single structures to make predictions, iWRAP first builds profiles that characterize the hydrophobic, electrostatic and structural properties specific to protein interfaces from multiple interface alignments. Compatibility with these profiles is used to predict the putative interface region between the two proteins. In addition to improved interface prediction, iWRAP provides better accuracy and close to 50% increase in coverage on genome-scale PPI prediction tasks. As an application, we effectively combine iWRAP with genomic data to identify novel cancer related genes involved in chromatin remodeling, nucleosome organization and ribonuclear complex assembly - processes known to be critical in cancer. Coev2Net addresses some of the limitations of iWRAP, and provides techniques to increase coverage and accuracy even further. Unlike earlier sequence and structure profiles, Coev2Net explicitly models long-distance correlations at protein interfaces. By formulating interface co-evolution as a high-dimensional sampling problem, we enrich sequence/structure profiles with artificial interacting homologus sequences for families which do not have known multiple interacting homologs. We build a spanning-tree based graphical model induced by the simulated sequences as our interface profile. Cross-validation results indicate that this approach is as good as previous methods at PPI prediction. We show that Coev2Net's predictions correlate with experimental observations and experimentally validate some of the high-confidence predictions. Furthermore, we demonstrate how analysis of the predicted interfaces together with human genomic variation data can help us understand the role of these mutations in disease and normal cells.by Raghavendra Hosur.Ph.D

DSpace@MIT