68,182 research outputs found
Multi-task Deep Neural Networks in Automated Protein Function Prediction
In recent years, deep learning algorithms have outperformed the state-of-the
art methods in several areas thanks to the efficient methods for training and
for preventing overfitting, advancement in computer hardware, the availability
of vast amount data. The high performance of multi-task deep neural networks in
drug discovery has attracted the attention to deep learning algorithms in
bioinformatics area. Here, we proposed a hierarchical multi-task deep neural
network architecture based on Gene Ontology (GO) terms as a solution to protein
function prediction problem and investigated various aspects of the proposed
architecture by performing several experiments. First, we showed that there is
a positive correlation between performance of the system and the size of
training datasets. Second, we investigated whether the level of GO terms on GO
hierarchy related to their performance. We showed that there is no relation
between the depth of GO terms on GO hierarchy and their performance. In
addition, we included all annotations to the training of a set of GO terms to
investigate whether including noisy data to the training datasets change the
performance of the system. The results showed that including less reliable
annotations in training of deep neural networks increased the performance of
the low performed GO terms, significantly. We evaluated the performance of the
system using hierarchical evaluation method. Mathews correlation coefficient
was calculated as 0.75, 0.49 and 0.63 for molecular function, biological
process and cellular component categories, respectively. We showed that deep
learning algorithms have a great potential in protein function prediction area.
We plan to further improve the DEEPred by including other types of annotations
from various biological data sources. We plan to construct DEEPred as an open
access online tool.Comment: 19 pages, 4 figures, 4 table
Capturing Evolution Genes for Time Series Data
The modeling of time series is becoming increasingly critical in a wide
variety of applications. Overall, data evolves by following different patterns,
which are generally caused by different user behaviors. Given a time series, we
define the evolution gene to capture the latent user behaviors and to describe
how the behaviors lead to the generation of time series. In particular, we
propose a uniform framework that recognizes different evolution genes of
segments by learning a classifier, and adopt an adversarial generator to
implement the evolution gene by estimating the segments' distribution.
Experimental results based on a synthetic dataset and five real-world datasets
show that our approach can not only achieve a good prediction results (e.g.,
averagely +10.56% in terms of F1), but is also able to provide explanations of
the results.Comment: a preprint version. arXiv admin note: text overlap with
arXiv:1703.10155 by other author
TITER: predicting translation initiation sites by deep learning.
MotivationTranslation initiation is a key step in the regulation of gene expression. In addition to the annotated translation initiation sites (TISs), the translation process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging to predict TISs and study the underlying regulatory mechanisms. Meanwhile, the advent of several high-throughput sequencing techniques for profiling initiating ribosomes at single-nucleotide resolution, e.g. GTI-seq and QTI-seq, provides abundant data for systematically studying the general principles of translation initiation and the development of computational method for TIS identification.MethodsWe have developed a deep learning-based framework, named TITER, for accurately predicting TISs on a genome-wide scale based on QTI-seq data. TITER extracts the sequence features of translation initiation from the surrounding sequence contexts of TISs using a hybrid neural network and further integrates the prior preference of TIS codon composition into a unified prediction framework.ResultsExtensive tests demonstrated that TITER can greatly outperform the state-of-the-art prediction methods in identifying TISs. In addition, TITER was able to identify important sequence signatures for individual types of TIS codons, including a Kozak-sequence-like motif for AUG start codon. Furthermore, the TITER prediction score can be related to the strength of translation initiation in various biological scenarios, including the repressive effect of the upstream open reading frames on gene expression and the mutational effects influencing translation initiation efficiency.Availability and implementationTITER is available as an open-source software and can be downloaded from https://github.com/zhangsaithu/titer [email protected] or [email protected] informationSupplementary data are available at Bioinformatics online
- …