Search CORE

10,101 research outputs found

DeepSF: deep convolutional neural network for mapping protein sequences to folds

Author: Alfonso Valencia
Altschul
Altschul
Badri Adhikari
Berman
Cao
Chandonia
Cheng
Cheng
Chung
Cui
Damoulas
Dill
Dong
Eickholt
Greene
Hadley
Henikoff
Holm
Jackson
Jianlin Cheng
Jie Hou
Jo
Jo
Kalchbrenner
Kim
Kinch
Kinch
Krizhevsky
Li
Ma
Magnan
McGuffin
Murzin
Shen
Spencer
Srivastava
Söding
Wang
Wang
Wang
Webb
Wei
Xia
Xu
Zhang
Publication venue
Publication date: 03/06/2017
Field of study

Motivation Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a tar get protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice. Results We develop a deep 1D-convolution neural network (DeepSF) to directly classify any protein se quence into one of 1195 known folds, which is useful for both fold recognition and the study of se quence-structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold-related features from a protein sequence of any length and map it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 77.0%. We compare our method with a top profile profile alignment method - HHSearch on hard template-based and template-free modeling targets of CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is 14.5%-29.1% higher than HHSearch on template-free modeling targets and 4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking.Comment: 28 pages, 13 figure

arXiv.org e-Print Archive

Crossref

University of Missouri, St. Louis

Link Prediction in Complex Networks: A Survey

Author: Adamic
Airoldi
Albert
Alon
Amaral
Arenas
Baiesi
Barabási
Barahona
Bayes
Bianconi
Blondel
Boccaletti
Breiman
Brin
Buntine
Burke
Butts
Caldarelli
Carmi
Casella
Casella
Chebotarev
Chu
Clauset
Colizza
Cui
da
Dasgupta
Dawah
Dorelan
Dorogovtsev
Fouss
Fouss
Gallagher
Gastner
Geisser
Getoor
Girvan
Granger
Guha
Guimerà
Guimerà
Guimerà
Guimerà
Hanely
Heckerman
Heckerman
Herlocker
Holland
Holme
Holme
Huang
Huang
Huss
Jaccard
Jeh
Jung
Kaluza
Katz
Kim
Klein
Kohavi
Kossinets
Krebs
Kunegis
Lambiotte
Leicht
Leroy
Leskovec
Liben-Nowell
Lin
Linyuan Lü
Liu
Liu
Liu
Liu
Liu
Lusseau
Lü
Lü
Mann
Manning
Mantrach
Marvel
Metropolis
Molloy
Moore
Mossel
Murata
Neal
Neville
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Ou
O’Madadhain
Pan
Pastor-Satorras
Pastor-Satorras
Penrose
Perotti
Polikar
Ravasz
Redner
Reed
Reichardt
Sales-Pardo
Salton
Salton
Schafer
Schafer
Shang
Shang
Shawe-Taylor
Spiegelhalter
Spring
Stumpf
Su
Sun
Szell
Sørensen
Tao Zhou
Taskar
Tong
Traag
Tylenda
Valverde
von Mering
Vázquez
Wang
Watts
White
White
White
Wilcoxon
Xiao
Xie
Yan
Yin
Yin
Yu
Yu
Yu
Zachary
Zeng
Zhang
Zhang
Zhang
Zhang
Zhang
Zheleva
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 04/10/2010
Field of study

Link prediction in complex networks has attracted increasing attention from both physical and computer science communities. The algorithms can be used to extract missing information, identify spurious interactions, evaluate network evolving mechanisms, and so on. This article summaries recent progress about link prediction algorithms, emphasizing on the contributions from physical perspectives and approaches, such as the random-walk-based methods and the maximum likelihood methods. We also introduce three typical applications: reconstruction of networks, evaluation of network evolving mechanism and classification of partially labelled networks. Finally, we introduce some applications and outline future challenges of link prediction algorithms.Comment: 44 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Automated Protein Structure Classification: A Survey

Author: Hassanzadeh Oktie
Publication venue
Publication date: 01/01/2008
Field of study

Classification of proteins based on their structure provides a valuable resource for studying protein structure, function and evolutionary relationships. With the rapidly increasing number of known protein structures, manual and semi-automatic classification is becoming ever more difficult and prohibitively slow. Therefore, there is a growing need for automated, accurate and efficient classification methods to generate classification databases or increase the speed and accuracy of semi-automatic techniques. Recognizing this need, several automated classification methods have been developed. In this survey, we overview recent developments in this area. We classify different methods based on their characteristics and compare their methodology, accuracy and efficiency. We then present a few open problems and explain future directions.Comment: 14 pages, Technical Report CSRG-589, University of Toront

arXiv.org e-Print Archive

CiteSeerX

Ab initio RNA folding

Author: Cragnolini Tristan
Derreumaux Philippe
Pasquali Samuela
Publication venue: 'IOP Publishing'
Publication date: 30/12/2014
Field of study

RNA molecules are essential cellular machines performing a wide variety of functions for which a specific three-dimensional structure is required. Over the last several years, experimental determination of RNA structures through X-ray crystallography and NMR seems to have reached a plateau in the number of structures resolved each year, but as more and more RNA sequences are being discovered, need for structure prediction tools to complement experimental data is strong. Theoretical approaches to RNA folding have been developed since the late nineties when the first algorithms for secondary structure prediction appeared. Over the last 10 years a number of prediction methods for 3D structures have been developed, first based on bioinformatics and data-mining, and more recently based on a coarse-grained physical representation of the systems. In this review we are going to present the challenges of RNA structure prediction and the main ideas behind bioinformatic approaches and physics-based approaches. We will focus on the description of the more recent physics-based phenomenological models and on how they are built to include the specificity of the interactions of RNA bases, whose role is critical in folding. Through examples from different models, we will point out the strengths of physics-based approaches, which are able not only to predict equilibrium structures, but also to investigate dynamical and thermodynamical behavior, and the open challenges to include more key interactions ruling RNA folding.Comment: 28 pages, 18 figure

arXiv.org e-Print Archive

Hal-Diderot

Evaluation of protein surface roughness index using its heat denatured aggregates

Author: Hrishikesh Mishra
Tapobrata Lahiri
Publication venue
Publication date: 27/08/2009
Field of study

Recent research works on potential of different protein surface describing parameters to predict protein surface properties gained significance for its possible implication in extracting clues on protein's functional site. In this direction, Surface Roughness Index, a surface topological parameter, showed its potential to predict SCOP-family of protein. The present work stands on the foundation of these works where a semi-empirical method for evaluation of Surface Roughness Index directly from its heat denatured protein aggregates (HDPA) was designed and demonstrated successfully. The steps followed consist, the extraction of a feature, Intensity Level Multifractal Dimension (ILMFD) from the microscopic images of HDPA, followed by the mapping of ILMFD into Surface Roughness Index (SRI) through recurrent backpropagation network (RBPN). Finally SRI for a particular protein was predicted by clustering of decisions obtained through feeding of multiple data into RBPN, to obtain general tendency of decision, as well as to discard the noisy dataset. The cluster centre of the largest cluster was found to be the best match for mapping of Surface Roughness Index of each protein in our study. The semi-empirical approach adopted in this paper, shows a way to evaluate protein's surface property without depending on its already evaluated structure

Nature Precedings