6,079 research outputs found
Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks
Protein secondary structure prediction is an important problem in
bioinformatics. Inspired by the recent successes of deep neural networks, in
this paper, we propose an end-to-end deep network that predicts protein
secondary structures from integrated local and global contextual features. Our
deep architecture leverages convolutional neural networks with different kernel
sizes to extract multiscale local contextual features. In addition, considering
long-range dependencies existing in amino acid sequences, we set up a
bidirectional neural network consisting of gated recurrent unit to capture
global contextual features. Furthermore, multi-task learning is utilized to
predict secondary structure labels and amino-acid solvent accessibility
simultaneously. Our proposed deep network demonstrates its effectiveness by
achieving state-of-the-art performance, i.e., 69.7% Q8 accuracy on the public
benchmark CB513, 76.9% Q8 accuracy on CASP10 and 73.1% Q8 accuracy on CASP11.
Our model and results are publicly available.Comment: 8 pages, 3 figures, Accepted by International Joint Conferences on
Artificial Intelligence (IJCAI
Convolutional LSTM Networks for Subcellular Localization of Proteins
Machine learning is widely used to analyze biological sequence data.
Non-sequential models such as SVMs or feed-forward neural networks are often
used although they have no natural way of handling sequences of varying length.
Recurrent neural networks such as the long short term memory (LSTM) model on
the other hand are designed to handle sequences. In this study we demonstrate
that LSTM networks predict the subcellular location of proteins given only the
protein sequence with high accuracy (0.902) outperforming current state of the
art algorithms. We further improve the performance by introducing convolutional
filters and experiment with an attention mechanism which lets the LSTM focus on
specific parts of the protein. Lastly we introduce new visualizations of both
the convolutional filters and the attention mechanisms and show how they can be
used to extract biological relevant knowledge from the LSTM networks
DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.
Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ
- …