Search CORE

11 research outputs found

DeepSF: deep convolutional neural network for mapping protein sequences to folds

Author: Alfonso Valencia
Altschul
Altschul
Badri Adhikari
Berman
Cao
Chandonia
Cheng
Cheng
Chung
Cui
Damoulas
Dill
Dong
Eickholt
Greene
Hadley
Henikoff
Holm
Jackson
Jianlin Cheng
Jie Hou
Jo
Jo
Kalchbrenner
Kim
Kinch
Kinch
Krizhevsky
Li
Ma
Magnan
McGuffin
Murzin
Shen
Spencer
Srivastava
Söding
Wang
Wang
Wang
Webb
Wei
Xia
Xu
Zhang
Publication venue
Publication date: 03/06/2017
Field of study

Motivation Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a tar get protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice. Results We develop a deep 1D-convolution neural network (DeepSF) to directly classify any protein se quence into one of 1195 known folds, which is useful for both fold recognition and the study of se quence-structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold-related features from a protein sequence of any length and map it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 77.0%. We compare our method with a top profile profile alignment method - HHSearch on hard template-based and template-free modeling targets of CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is 14.5%-29.1% higher than HHSearch on template-free modeling targets and 4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking.Comment: 28 pages, 13 figure

arXiv.org e-Print Archive

Crossref

University of Missouri, St. Louis

Proceedings of the 2014 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference

Author: Burian Dennis
Dozmorov Mikhail G.
Hoyt Peter
Kaundal Rakesh
Perkins Andy
Wren Jonathan D.
Zhang Chaoyang
Publication venue: The Aquila Digital Community
Publication date: 21/10/2014
Field of study

Aquila Digital Community (University of Southern Mississippi, USM)

Springer - Publisher Connector

PubMed Central

Proceedings of the 2014 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference

Author: Andy Perkins
Chaoyang Zhang
Dennis Burian
EA Peterson
H Ng
IT Toby
J Hennessey
J Hennessey
Jonathan D Wren
M Jaiswal
MA Bauer
Mikhail G Dozmorov
NS Vo
Peter Hoyt
Rakesh Kaundal
SC Grace
SS Sahu
T Jo
T Weirick
W Zhang
W Zhang
W Zhao
Y Peng
Z Yue
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/10/2014
Field of study

Aquila Digital Community (University of Southern Mississippi, USM)

Crossref

Springer - Publisher Connector

PubMed Central

Protein Fold Recognition from Sequences using Convolutional and Recurrent Neural Networks

Author: Gómez García Ángel Manuel
Morales Cordovilla Juan Andrés
Sánchez Calle Victoria Eugenia
Villegas Morcillo Amelia Otilia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/12/2021
Field of study

The identification of a protein fold type from its amino acid sequence provides important insights about the protein 3D structure. In this paper, we propose a deep learning architecture that can process protein residue-level features to address the protein fold recognition task. Our neural network model combines 1D-convolutional layers with gated recurrent unit (GRU) layers. The GRU cells, as recurrent layers, cope with the processing issues associated to the highly variable protein sequence lengths and so extract a fold-related embedding of fixed size for each protein domain. These embeddings are then used to perform the pairwise fold recognition task, which is based on transferring the fold type of the most similar template structure. We compare our model with several template-based and deep learning-based methods from the state-of-the-art. The evaluation results over the well-known LINDAHL and SCOP_TEST sets,along with a proposed LINDAHL test set updated to SCOP 1.75, show that our embeddings perform significantly better than these methods, specially at the fold level. Supplementary material, source code and trained models are available at http://sigmat.ugr.es/~amelia/CNN-GRU-RF+/

Repositorio Institucional Universidad de Granada

RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Adaptive local learning in sampling based motion planning for protein folding

Author: Amato Nancy
Ekenna Chinwe
Thomas Shawna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

BACKGROUND: Simulating protein folding motions is an important problem in computational biology. Motion planning algorithms, such as Probabilistic Roadmap Methods, have been successful in modeling the folding landscape. Probabilistic Roadmap Methods and variants contain several phases (i.e., sampling, connection, and path extraction). Most of the time is spent in the connection phase and selecting which variant to employ is a difficult task. Global machine learning has been applied to the connection phase but is inefficient in situations with varying topology, such as those typical of folding landscapes. RESULTS: We develop a local learning algorithm that exploits the past performance of methods within the neighborhood of the current connection attempts as a basis for learning. It is sensitive not only to different types of landscapes but also to differing regions in the landscape itself, removing the need to explicitly partition the landscape. We perform experiments on 23 proteins of varying secondary structure makeup with 52–114 residues. We compare the success rate when using our methods and other methods. We demonstrate a clear need for learning (i.e., only learning methods were able to validate against all available experimental data) and show that local learning is superior to global learning producing, in many cases, significantly higher quality results than the other methods. CONCLUSIONS: We present an algorithm that uses local learning to select appropriate connection methods in the context of roadmap construction for protein folding. Our method removes the burden of deciding which method to use, leverages the strengths of the individual input methods, and it is extendable to include other future connection methods

Crossref

Springer - Publisher Connector

Texas A&M Repository

PubMed Central

Adaptive local learning in sampling based motion planning for protein folding

Author: A Gareth
A Matouschek
A Yershova
AL Beberg
AR Viguera
C Ekenna
C Louis-Jeune
Chinwe Ekenna
D Berenson
D Hsu
DG Covell
DJ Jacobs
DS Riddle
E Plaku
F Chiti
G Song
H Günther
HM Berman
I Al-Bluwi
J Cortés
J Kuszewski
JC Martínez
JD Bryngelson
JK Uhlmann
K Teilum
L Mayne
L Zhang
LA Munishkina
LE Kavraki
M Levitt
M Levitt
M Morales
MS Apaydin
MS Smyth
Nancy M. Amato
NM Amato
P Abbeel
P Auer
Q Yi
R Li
S Arya
S Matysiak
S Nauli
S Rodriguez
S Thomas
SE Jackson
Shawna Thomas
T Jo
T Liu
TE Wales
V Muñoz
V Villegas
VP Grantcharova
WA Eaton
Y Shen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Molecular Evolutionary Studies using Structural Genomics and Proteomics.

Author: Xu Jinrui
Publication venue
Publication date: 01/01/2015
Field of study

The field of molecular evolution has progressed with the accumulation of various molecular data. It started with the analysis of protein sequence data, followed by that of gene and genome sequence dada. Recently, structural genomics and proteomics have offered new types of data for addressing molecular evolution questions. Structural genomics refers to genome-wide collection of protein structures, whereas proteomics is the study of all proteins in a cell or organism. In this thesis, I conducted molecular evolutionary projects using data provided by structural genomics and proteomics. First, I used protein structure information to explain why some human-disease associated amino acid residues (DARs) appear as the wild-type in other species. Because destabilizing protein structures is a primary reason why DARs are deleterious, I focused on protein stability and discovered that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This finding of compensatory residue substitutions has important implications for understanding epistasis in protein evolution. Second, the recently published human proteomes include peptides encoded by annotated pseudogenes, which are relics of formerly functional genes. These translated pseudogenes may actually be functional and subject to purifying selection. Alternatively, their translations may be accidental and do not indicate functionality. My analysis suggests that a sizable fraction of the translated pseudogenes are subject to purifying selection acting at the protein level. Third, for the purpose of understanding protein evolution and structure-function relationships, protein structures are classified according to their structure similarities. A fold encompasses protein structures with similar core topologies. Current fold classifications implicitly assume that folds are discrete islands in the protein structure space, whereas increasing evidence supports a continuous fold space. I developed a likelihood method to classify structures into existing folds by considering the continuity in fold space. My results using this method demonstrated the growing importance of considering this continuity in fold classification. Together, my work illustrated the utility of structural genomics and proteomics in answering evolutionary questions and provided better understanding of gene and protein evolution.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113597/1/jinruixu_1.pd

Deep Blue Documents at the University of Michigan