2 research outputs found
AUC-maximized Deep Convolutional Neural Fields for Sequence Labeling
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in
a variety of machine learning tasks. This manuscript presents Deep
Convolutional Neural Fields (DeepCNF), a combination of DCNN with Conditional
Random Field (CRF), for sequence labeling with highly imbalanced label
distribution. The widely-used training methods, such as maximum-likelihood and
maximum labelwise accuracy, do not work well on highly imbalanced data. To
handle this, we present a new training algorithm called maximum-AUC for
DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area
Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced
data. To fulfill this, we formulate AUC in a pairwise ranking framework,
approximate it by a polynomial function and then apply a gradient-based
procedure to optimize it. We then test our AUC-maximized DeepCNF on three very
different protein sequence labeling tasks: solvent accessibility prediction,
8-state secondary structure prediction, and disorder prediction. Our
experimental results confirm that maximum-AUC greatly outperforms the other two
training methods on 8-state secondary structure prediction and disorder
prediction since their label distributions are highly imbalanced and also have
similar performance as the other two training methods on the solvent
accessibility prediction problem which has three equally-distributed labels.
Furthermore, our experimental results also show that our AUC-trained DeepCNF
models greatly outperform existing popular predictors of these three tasks.Comment: Under review as a conference paper at ICLR 201
Deep learning in bioinformatics: introduction, application, and perspective in big data era
Deep learning, which is especially formidable in handling big data, has
achieved great success in various fields, including bioinformatics. With the
advances of the big data era in biology, it is foreseeable that deep learning
will become increasingly important in the field and will be incorporated in
vast majorities of analysis pipelines. In this review, we provide both the
exoteric introduction of deep learning, and concrete examples and
implementations of its representative applications in bioinformatics. We start
from the recent achievements of deep learning in the bioinformatics field,
pointing out the problems which are suitable to use deep learning. After that,
we introduce deep learning in an easy-to-understand fashion, from shallow
neural networks to legendary convolutional neural networks, legendary recurrent
neural networks, graph neural networks, generative adversarial networks,
variational autoencoder, and the most recent state-of-the-art architectures.
After that, we provide eight examples, covering five bioinformatics research
directions and all the four kinds of data type, with the implementation written
in Tensorflow and Keras. Finally, we discuss the common issues, such as
overfitting and interpretability, that users will encounter when adopting deep
learning methods and provide corresponding suggestions. The implementations are
freely available at \url{https://github.com/lykaust15/Deep_learning_examples}