Search CORE

26,943 research outputs found

ProLanGO: Protein Function Prediction Using Neural~Machine Translation Based on a Recurrent Neural Network

Author: Cao Renzhi
Chan Leong
Chen Zhangxin
Freitas Colton
Jiang Haiqing
Sun Miao
Publication venue
Publication date: 01/10/2017
Field of study

With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Representing and analysing molecular and cellular function in the computer

Author: Eldridge M
Gilbert D
Helden JV
Mancuso R
Naim A
Wernisch L
Wodak SJ
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date: 01/01/2000
Field of study

Determining the biological function of a myriad of genes, and understanding how they interact to yield a living cell, is the major challenge of the post genome-sequencing era. The complexity of biological systems is such that this cannot be envisaged without the help of powerful computer systems capable of representing and analysing the intricate networks of physical and functional interactions between the different cellular components. In this review we try to provide the reader with an appreciation of where we stand in this regard. We discuss some of the inherent problems in describing the different facets of biological function, give an overview of how information on function is currently represented in the major biological databases, and describe different systems for organising and categorising the functions of gene products. In a second part, we present a new general data model, currently under development, which describes information on molecular function and cellular processes in a rigorous manner. The model is capable of representing a large variety of biochemical processes, including metabolic pathways, regulation of gene expression and signal transduction. It also incorporates taxonomies for categorising molecular entities, interactions and processes, and it offers means of viewing the information at different levels of resolution, and dealing with incomplete knowledge. The data model has been implemented in the database on protein function and cellular processes 'aMAZE' (http://www.ebi.ac.uk/research/pfbp/), which presently covers metabolic pathways and their regulation. Several tools for querying, displaying, and performing analyses on such pathways are briefly described in order to illustrate the practical applications enabled by the model

HAL AMU

DI-fusion

Brunel University Research Archive

Integration of Biological Sources: Exploring the Case of Protein Homology

Author: Boerman Tjeerd W.
Keulen Maurice van
Severing Edouard I.
Vet Paul van der
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2011
Field of study

Data integration is a key issue in the domain of bioin- formatics, which deals with huge amounts of heteroge- neous biological data that grows and changes rapidly. This paper serves as an introduction in the field of bioinformatics and the biological concepts it deals with, and an exploration of the integration problems a bioinformatics scientist faces. We examine ProGMap, an integrated protein homology system used by bioin- formatics scientists at Wageningen University, and several use cases related to protein homology. A key issue we identify is the huge manual effort required to unify source databases into a single resource. Un- certain databases are able to contain several possi- ble worlds, and it has been proposed that they can be used to significantly reduce initial integration efforts. We propose several directions for future work where uncertain databases can be applied to bioinformatics, with the goal of furthering the cause of bioinformatics integration

University of Twente Research Information

Domain-mediated interactions for protein subfamily identification

Author: Han S.K.
Kim D.
Kim I.
KIM SANGUK
Kong J.
Lee H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Within a protein family, proteins with the same domain often exhibit different cellular functions, despite the shared evolutionary history and molecular function of the domain. We hypothesized that domain-mediated interactions (DMIs) may categorize a protein family into subfamilies because the diversified functions of a single domain often depend on interacting partners of domains. Here we systematically identified DMI subfamilies, in which proteins share domains with DMI partners, as well as with various functional and physical interaction networks in individual species. In humans, DMI subfamily members are associated with similar diseases, including cancers, and are frequently co-associated with the same diseases. DMI information relates to the functional and evolutionary subdivisions of human kinases. In yeast, DMI subfamilies contain proteins with similar phenotypic outcomes from specific chemical treatments. Therefore, the systematic investigation here provides insights into the diverse functions of subfamilies derived from a protein family with a link-centric approach and suggests a useful resource for annotating the functions and phenotypic outcomes of proteins.11Ysciescopu

포항공과대학교

Ribosome traffic on mRNAs maps to gene ontology : genome-wide quantification of translation initiation rates and polysome size regulation

Author: Ciandrini Luca
Romano M Carmen
Stansfield Ian
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

CiteSeerX

Directory of Open Access Journals

PubMed Central

Multi-task Deep Neural Networks in Automated Protein Function Prediction

Author: Atalay Mehmet Volkan
Cetin-Atalay Rengul
Doğan Tunca
Martin Maria Jesus
Rifaioglu Ahmet Sureyya
Publication venue
Publication date: 01/05/2017
Field of study

In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.Comment: 19 pages, 4 figures, 4 table

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

Functional Bias and Spatial Organization of Genes in Mutational Hot and Cold Regions in the Human Genome

Author: Chuang Jeffrey H.
Li Hao
Publication venue
Publication date: 01/02/2004
Field of study

The neutral mutation rate is known to vary widely along human chromosomes, leading to mutational hot and cold regions. We provide evidence that categories of functionally-related genes reside preferentially in mutationally hot or cold regions, the size of which we have measured. Genes in hot regions are biased toward extra-cellular communication (surface receptors, cell adhesion, immune response, etc.) while those in cold regions are biased toward essential cellular processes (gene regulation, RNA processing, protein modification, etc.). From a selective perspective, this organization of genes could minimize the mutational load on genes that need to be conserved and allow fast evolution for genes that must frequently adapt. We also analyze the effect of gene duplication and chromosomal recombination, which contribute significantly to these biases for certain categories of hot genes. Overall, our results show that genes are located non-randomly with respect to hot and cold regions, offering the possibility that selection acts at the level of gene location in the human genome.Comment: 17 pages, 6 figures, 2 tables. accepted to PLOS Biology, Feb. 2004 issu

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare