Search CORE

52,001 research outputs found

ProLanGO: Protein Function Prediction Using Neural~Machine Translation Based on a Recurrent Neural Network

Author: Cao Renzhi
Chan Leong
Chen Zhangxin
Freitas Colton
Jiang Haiqing
Sun Miao
Publication venue
Publication date: 01/10/2017
Field of study

With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Inferring gene regulatory networks using ensembles of feature selection techniques

Author: Demeester Piet
Dhaene Tom
Geurts Pierre
Huynh-thu Vân anh
Ruyssinck Joeri
Saeys Yvan
Publication venue
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

Stable Feature Selection for Biomarker Discovery

Author: He Zengyou
Yu Weichuan
Publication venue
Publication date: 01/01/2010
Field of study

Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

arXiv.org e-Print Archive

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

An Overview of the Use of Neural Networks for Data Mining Tasks

Author: Alberts B
Alpaydin E
Ando T
Blake CL
Bramer MA
Castanheira LG
Han J
Lu H
Mitchell M
Ni X
Quinlan RJ
Rumelhart DE
Shafer JC
Shendure J
Simić D
Stahl F
Steinwart I
Surjandari I
Wei JS
Widrow B
Witten IH
Zaslavsky B
Zhang D
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

Central Archive at the University of Reading

Crossref

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online

Prediction of protein-protein interactions using one-class classification methods and integrating diverse data

Author: Gilbert D
Reyes J A
Publication venue: JIB
Publication date: 01/01/2007
Field of study

This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results. Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature

CiteSeerX

Directory of Open Access Journals

Brunel University Research Archive