Search CORE

56 research outputs found

Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques

Author: Khanal Reecha
Publication venue: ScholarWorks@UNO
Publication date: 01/04/2019
Field of study

Identification and annotation of RNA Binding Proteins (RBPs) and RNA Binding residues from sequence information alone is one of the most challenging problems in computational biology. RBPs play crucial roles in several fundamental biological functions including transcriptional regulation of RNAs and RNA metabolism splicing. Existing experimental techniques are time-consuming and costly. Thus, efficient computational identification of RBPs directly from the sequence can be useful to annotate RBP and assist the experimental design. Here, we introduce AIRBP, a computational sequence-based method, which utilizes features extracted from evolutionary information, physiochemical properties, and disordered properties to train a machine learning method designed using stacking, an advanced machine learning technique, for effective prediction of RBPs. Furthermore, it makes use of efficient machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbor and XGBoost (Extreme Gradient Boosting Algorithm). In this research work, we also propose another predictor for efficient annotation of RBP residues. This RBP residue predictor also uses stacking and evolutionary algorithms for efficient annotation of RBPs and RNA Binding residue. The RNA-binding residue predictor also utilizes various evolutionary, physicochemical and disordered properties to train a robust model. This thesis presents a possible solution to the RBP and RNA binding residue prediction problem through two independent predictors, both of which outperform existing state-of-the-art approaches

Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks

Author: Li Zhen
Yu Yizhou
Publication venue
Publication date: 01/01/2016
Field of study

Protein secondary structure prediction is an important problem in bioinformatics. Inspired by the recent successes of deep neural networks, in this paper, we propose an end-to-end deep network that predicts protein secondary structures from integrated local and global contextual features. Our deep architecture leverages convolutional neural networks with different kernel sizes to extract multiscale local contextual features. In addition, considering long-range dependencies existing in amino acid sequences, we set up a bidirectional neural network consisting of gated recurrent unit to capture global contextual features. Furthermore, multi-task learning is utilized to predict secondary structure labels and amino-acid solvent accessibility simultaneously. Our proposed deep network demonstrates its effectiveness by achieving state-of-the-art performance, i.e., 69.7% Q8 accuracy on the public benchmark CB513, 76.9% Q8 accuracy on CASP10 and 73.1% Q8 accuracy on CASP11. Our model and results are publicly available.Comment: 8 pages, 3 figures, Accepted by International Joint Conferences on Artificial Intelligence (IJCAI

arXiv.org e-Print Archive

HKU Scholars Hub

Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques

Author: Khanal Reecha
Publication venue: ScholarWorks@UNO
Publication date: 01/04/2019
Field of study

University of New Orleans

Accurate prediction of protein relative solvent accessibility using a balanced model

Author: B Lee
B Petersen
C Chothia
C Fan
C Mirabello
CN Magnan
DT Jones
E Faraggi
G Pollastri
G Pollastri
G Wang
I Walsh
J Eickholt
J Eickholt
J Hoskins
J Lafferty
J Lafferty
J Sim
J Sun
J Zhang
JY Wang
K Joo
KI Cho
L Wang
P Cong
Peisheng Cong
PW Rose
R Adamczak
S Liu
S Wang
S Wang
S Wang
SF Altschul
Tonghua Li
W Kabsch
W Li
Wei Wu
WR Atchley
Z Tang
Zhiheng Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-based Protein Structure Prediction

Author: Lanchantin Jack
Lin Zeming
Qi Yanjun
Publication venue
Publication date: 21/02/2016
Field of study

Predicting protein properties such as solvent accessibility and secondary structure from its primary amino acid sequence is an important task in bioinformatics. Recently, a few deep learning models have surpassed the traditional window based multilayer perceptron. Taking inspiration from the image classification domain we propose a deep convolutional neural network architecture, MUST-CNN, to predict protein properties. This architecture uses a novel multilayer shift-and-stitch (MUST) technique to generate fully dense per-position predictions on protein sequences. Our model is significantly simpler than the state-of-the-art, yet achieves better results. By combining MUST and the efficient convolution operation, we can consider far more parameters while retaining very fast prediction speeds. We beat the state-of-the-art performance on two large protein property prediction datasets.Comment: 8 pages ; 3 figures ; deep learning based sequence-sequence prediction. in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Protein Fold Recognition from Sequences using Convolutional and Recurrent Neural Networks

Author: Gómez García Ángel Manuel
Morales Cordovilla Juan Andrés
Sánchez Calle Victoria Eugenia
Villegas Morcillo Amelia Otilia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/12/2021
Field of study

The identification of a protein fold type from its amino acid sequence provides important insights about the protein 3D structure. In this paper, we propose a deep learning architecture that can process protein residue-level features to address the protein fold recognition task. Our neural network model combines 1D-convolutional layers with gated recurrent unit (GRU) layers. The GRU cells, as recurrent layers, cope with the processing issues associated to the highly variable protein sequence lengths and so extract a fold-related embedding of fixed size for each protein domain. These embeddings are then used to perform the pairwise fold recognition task, which is based on transferring the fold type of the most similar template structure. We compare our model with several template-based and deep learning-based methods from the state-of-the-art. The evaluation results over the well-known LINDAHL and SCOP_TEST sets,along with a proposed LINDAHL test set updated to SCOP 1.75, show that our embeddings perform significantly better than these methods, specially at the fold level. Supplementary material, source code and trained models are available at http://sigmat.ugr.es/~amelia/CNN-GRU-RF+/

Repositorio Institucional Universidad de Granada

Structural Property Prediction

Author: Abeln Sanne
Bawono Punto
Bouwmeester Robbin
Dijkstra Maurits
Feenstra K. Anton
Gavaldá-Garciá Jose
Heringa Jaap
Houtkamp Isabel
Okounev Mascha
Stringer Bas
van Gils Juami H. M.
Publication venue
Publication date: 06/09/2023
Field of study

While many good textbooks are available on Protein Structure, Molecular Simulations, Thermodynamics and Bioinformatics methods in general, there is no good introductory level book for the field of Structural Bioinformatics. This book aims to give an introduction into Structural Bioinformatics, which is where the previous topics meet to explore three dimensional protein structures through computational analysis. We provide an overview of existing computational techniques, to validate, simulate, predict and analyse protein structures. More importantly, it will aim to provide practical knowledge about how and when to use such techniques. We will consider proteins from three major vantage points: Protein structure quantification, Protein structure prediction, and Protein simulation & dynamics. Some structural properties of proteins that are closely linked to their function may be easier (or much faster) to predict from sequence than the complete tertiary structure; for example, secondary structure, surface accessibility, flexibility, disorder, interface regions or hydrophobic patches. Serving as building blocks for the native protein fold, these structural properties also contain important structural and functional information not apparent from the amino acid sequence. Here, we will first give an introduction into the application of machine learning for structural property prediction, and explain the concepts of cross-validation and benchmarking. Next, we will review various methods that incorporate knowledge of these concepts to predict those structural properties, such as secondary structure, surface accessibility, disorder and flexibility, and aggregation.Comment: editorial responsability: Juami H. M. van Gils, K. Anton Feenstra, Sanne Abeln. This chapter is part of the book "Introduction to Protein Structural Bioinformatics". The Preface arXiv:1801.09442 contains links to all the (published) chapter

arXiv.org e-Print Archive