Search CORE

1,213 research outputs found

Deep learning methods for protein torsion angle prediction

Author: Adhikari Badri
Cheng Jianlin
Hou Jie
Li Haiou
Lyu Qiang
Publication venue: IRL @ UMSL
Publication date: 18/09/2017
Field of study

Background: Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. Results: We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Conclusions: Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy

University of Missouri, St. Louis

Predicting Secondary Structures, Contact Numbers, and Residue-wise Contact Orders of Native Protein Structure from Amino Acid Sequence by Critical Random Networks

Author: Altschul S. F., Madden, T. L., Sch
Baldi P., Brunak, S., Frasconi, P.
CHANDONIA J-M
Crooks G. E. &amp
Kinjo A. R. &amp
Kinjo A. R. &amp
Kinjo A. R., Horimoto, K. &amp
Lee B. &amp
Li W., Jaroszewski, L. &amp
Nishikawa K. &amp
Pollastri G., Baldi, P., Fariselli
Rost B.
TATENO Y
Publication venue: 'Biophysical Society of Japan'
Publication date: 01/01/2005
Field of study

Prediction of one-dimensional protein structures such as secondary structures and contact numbers is useful for the three-dimensional structure prediction and important for the understanding of sequence-structure relationship. Here we present a new machine-learning method, critical random networks (CRNs), for predicting one-dimensional structures, and apply it, with position-specific scoring matrices, to the prediction of secondary structures (SS), contact numbers (CN), and residue-wise contact orders (RWCO). The present method achieves, on average,

Q_3

accuracy of 77.8% for SS, correlation coefficients of 0.726 and 0.601 for CN and RWCO, respectively. The accuracy of the SS prediction is comparable to other state-of-the-art methods, and that of the CN prediction is a significant improvement over previous methods. We give a detailed formulation of critical random networks-based prediction scheme, and examine the context-dependence of prediction accuracies. In order to study the nonlinear and multi-body effects, we compare the CRNs-based method with a purely linear method based on position-specific scoring matrices. Although not superior to the CRNs-based method, the surprisingly good accuracy achieved by the linear method highlights the difficulty in extracting structural features of higher order from amino acid sequence beyond that provided by the position-specific scoring matrices.Comment: 20 pages, 1 figure, 5 tables; minor revision; accepted for publication in BIOPHYSIC

arXiv.org e-Print Archive

Crossref

Protein Fold Recognition from Sequences using Convolutional and Recurrent Neural Networks

Author: Gómez García Ángel Manuel
Morales Cordovilla Juan Andrés
Sánchez Calle Victoria Eugenia
Villegas Morcillo Amelia Otilia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/12/2021
Field of study

The identification of a protein fold type from its amino acid sequence provides important insights about the protein 3D structure. In this paper, we propose a deep learning architecture that can process protein residue-level features to address the protein fold recognition task. Our neural network model combines 1D-convolutional layers with gated recurrent unit (GRU) layers. The GRU cells, as recurrent layers, cope with the processing issues associated to the highly variable protein sequence lengths and so extract a fold-related embedding of fixed size for each protein domain. These embeddings are then used to perform the pairwise fold recognition task, which is based on transferring the fold type of the most similar template structure. We compare our model with several template-based and deep learning-based methods from the state-of-the-art. The evaluation results over the well-known LINDAHL and SCOP_TEST sets,along with a proposed LINDAHL test set updated to SCOP 1.75, show that our embeddings perform significantly better than these methods, specially at the fold level. Supplementary material, source code and trained models are available at http://sigmat.ugr.es/~amelia/CNN-GRU-RF+/

Repositorio Institucional Universidad de Granada

Deep Learning for Genomics: A Concise Overview

Author: Wang Haohan
Yue Tianwei
Publication venue
Publication date: 08/05/2018
Field of study

Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.Comment: Invited chapter for Springer Book: Handbook of Deep Learning Application

arXiv.org e-Print Archive

Learning biophysically-motivated parameters for alpha helix prediction

Author: Anisimova Natalie
Antoniadou Chryssanthi
Araújo Rita
Bartsch Inka
Beker Beatriz
Benedetti-Cecchi Lisandro
Bertocci Iacopo
Christou Epaminondas
Cochrane Sabine
Cooper Keith
Craeymeersch Johan
Crisp Dennis J.
Dahle Salve
de Boissier Marilyse
de Kluijver Mario
de Vito Doris
Denisenko Stanislav
Duineveld Gerard
Escaravage Vincent
Fleischer Dirk
Fraschetti Simona
Giangrande Adriana
Heip Carlo
Hummel Herman
Janas Urszula
Karez Rolf
Kedra Monika
Kingston Paul
Kraberg Alexandra
Kuhlenkamp Ralph
Libes Maurice
Martens Peter
Mees Jan
Mieszkowska Nova
Mudrak Stella
Munda Ivka
Orfanidis Sotiris
Orlando-Bonaca Martina
Palerud Rune
Pinto Isabel Sousa
Rachor Eike
Reichert Katharina
Rumohr Heye
Schiedek Doris
Schubert Philipp
Sistermans W. C H
Southward Alan J.
Terlizzi Antonio
Tsiaga Evagelia
van Beusekom J. E E
Vanden Berghe Edward
Vandepitte Leen
Vanhoorne Bart
Warzocha Jan
Wasmund Norbert
Weslawski Jan Marcin
Widdicombe Claire
Wlodarska-Kowalczuk Maria
Zettler Michael L.
Publication venue: BioMed Central
Publication date: 01/05/2007
Field of study

Abstract Background Our goal is to develop a state-of-the-art protein secondary structure predictor, with an intuitive and biophysically-motivated energy model. We treat structure prediction as an optimization problem, using parameterizable cost functions representing biological "pseudo-energies". Machine learning methods are applied to estimate the values of the parameters to correctly predict known protein structures. Results Focusing on the prediction of alpha helices in proteins, we show that a model with 302 parameters can achieve a Q<it>α </it>value of 77.6% and an SOV<it>α </it>value of 73.4%. Such performance numbers are among the best for techniques that do not rely on external databases (such as multiple sequence alignments). Further, it is easier to extract biological significance from a model with so few parameters. Conclusion The method presented shows promise for the prediction of protein secondary structure. Biophysically-motivated elementary free-energies can be learned using SVM techniques to construct an energy cost function whose predictive performance rivals state-of-the-art. This method is general and can be extended beyond the all-alpha case described here.</p

OceanRep

Plymouth Marine Science Electronic Archive (PlyMSEA)

Directory of Open Access Journals

Electronic Publication Information Center

KNAW Repository

Archivio Istituzionale della Ricerca- Università del Salento

Crossref

Heriot Watt Pure

Archivio della ricerca - Università degli studi di Napoli Federico II

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Archivio della Ricerca - Università di Pisa

Repository of the University of Ljubljana

Open Marine Archive

Learning biophysically-motivated parameters for alpha helix prediction

Author: Devadas Srinivas
Gassend Blaise
Lee Andrew
O'Donnell Charles W
Thies William
van Dijk Marten
Publication venue: BioMed Central
Publication date: 01/05/2007
Field of study

Background: Our goal is to develop a state-of-the-art protein secondary structure predictor, with an intuitive and biophysically-motivated energy model. We treat structure prediction as an optimization problem, using parameterizable cost functions representing biological “pseudo-energies. ” Machine learning methods are applied to estimate the values of the parameters to correctly predict known protein structures. Results: Focusing on the prediction of alpha helices in proteins, we show that a model with 302 parameters can achieve a Qα value of 77.6 % and an SOVα value of 73.4%. Such performance numbers are among the best for techniques that do not rely on external databases (such as multiple sequence alignments). Further, it is easier to extract biological significance from a model with so few parameters. Conclusions: The method presented shows promise for the prediction of protein secondary structure. Biophysically-motivated elementary free-energies can be learned using SVM techniques to construct an energy cost function whose predictive performance rivals state-of-the-art. This method is general and can be extended beyond the all-alpha case described here. 1 Backgroun

CiteSeerX

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Deep learning methods for mining genomic sequence patterns

Author: Gao Xin
Publication venue: Digital Commons @ NJIT
Publication date: 31/12/2018
Field of study

Nowadays, with the growing availability of large-scale genomic datasets and advanced computational techniques, more and more data-driven computational methods have been developed to analyze genomic data and help to solve incompletely understood biological problems. Among them, deep learning methods, have been proposed to automatically learn and recognize the functional activity of DNA sequences from genomics data. Techniques for efficient mining genomic sequence pattern will help to improve our understanding of gene regulation, and thus accelerate our progress toward using personal genomes in medicine. This dissertation focuses on the development of deep learning methods for mining genomic sequences. First, we compare the performance between deep learning models and traditional machine learning methods in recognizing various genomic sequence patterns. Through extensive experiments on both simulated data and real genomic sequence data, we demonstrate that an appropriate deep learning model can be generally made for successfully recognizing various genomic sequence patterns. Next, we develop deep learning methods to help solve two specific biological problems, (1) inference of polyadenylation code and (2) tRNA gene detection and functional prediction. Polyadenylation is a pervasive mechanism that has been used by Eukaryotes for regulating mRNA transcription, localization, and translation efficiency. Polyadenylation signals in the plant are particularly noisy and challenging to decipher. A deep convolutional neural network approach DeepPolyA is proposed to predict poly(A) site from the plant Arabidopsis thaliana genomic sequences. It employs various deep neural network architectures and demonstrates its superiority in comparison with competing methods, including classical machine learning algorithms and several popular deep learning models. Transfer RNAs (tRNAs) represent a highly complex class of genes and play a central role in protein translation. There remains a de facto tool, tRNAscan-SE, for identifying tRNA genes encoded in genomes. Despite its popularity and success, tRNAscan-SE is still not powerful enough to separate tRNAs from pseudo-tRNAs, and a significant number of false positives can be output as a result. To address this issue, tRNA-DL, a hybrid combination of convolutional neural network and recurrent neural network approach is proposed. It is shown that the proposed method can help to reduce the false positive rate of the state-of-art tRNA prediction tool tRNAscan-SE substantially. Coupled with tRNAscan-SE, tRNA-DL can serve as a useful complementary tool for tRNA annotation. Taken together, the experiments and applications demonstrate the superiority of deep learning in automatic feature generation for characterizing genomic sequence patterns

Digital Commons @ New Jersey Institute of Technology (NJIT)

Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications

Author: Cedrón Francisco
Pastur-Romay L.A.
Pazos A.
Porto-Pazos Ana B.
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

[Abstract] Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure–Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron–Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods.Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2014/049Galicia. Consellería de Cultura, Educación e Ordenación Universitaria; R2014/039Instituto de Salud Carlos III; PI13/0028

Multidisciplinary Digital Publishing Institute

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central