Search CORE

67,526 research outputs found

Protein family classification using multiple-class neural networks.

Author: Zhang Xi
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2004
Field of study

The objective of genomic sequence analysis is to retrieve important information from the vast amount of genomic sequence data, such as DNA, RNA and protein sequences. The main task includes the interpretation of the function of DNA sequence on a genomic scale, the comparisons among genomes to gain insight into the universality of biological mechanisms and into the details of gene structure and function, the determination of the structure of all proteins and protein family classification. With its many features and capabilities for recognition, generalization and classification, artificial neural network technology is well suited for sequence analysis. At the state of the art, many methods have been devised to determine if a given protein sequence is member of a given protein superfamily. This is a binary classification problem, and efficient neural network techniques are mentioned in literature for solving such problem. In this Master\u27s thesis, we consider the problem of classifying given protein sequences into one among at least three protein families using neural networks, and, propose two methods: Pair-wise Multiple Classification Approach and Single Network Approach for this problem. In Pair-wise Multiple Classification Approach , several sub-networks are employed to perform the task whereas a compact network system is used in Single Network Approach . We performed experiments, using SNNS and UOWNNS neural network simulator on our NNs with different input/output representation, and reported accuracies as high as 95%. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z54. Source: Masters Abstracts International, Volume: 43-01, page: 0248. Adviser: Alioune Ngom. Thesis (M.Sc.)--University of Windsor (Canada), 2004

Scholarship at UWindsor

Nonlinear Models Using Dirichlet Process Mixtures

Author: Neal Radford M.
Shahbaba Babak
Publication venue
Publication date: 01/01/2007
Field of study

We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component. We use simulated data to compare the performance of this new approach to a simple multinomial logit (MNL) model, an MNL model with quadratic terms, and a decision tree model. We also evaluate our approach on a protein fold classification problem, and find that our model provides substantial improvement over previous methods, which were based on Neural Networks (NN) and Support Vector Machines (SVM). Folding classes of protein have a hierarchical structure. We extend our method to classification problems where a class hierarchy is available. We find that using the prior information regarding the hierarchical structure of protein folds can result in higher predictive accuracy

arXiv.org e-Print Archive

CiteSeerX

Protein Super family Classification using Artificial Neural Networks

Author: Vulisetty Anuja Swetha
Publication venue
Publication date: 20/04/1991
Field of study

Classification, or supervised learning, is one of the major data mining processes. Pattern recognition involves assigning a label to a given input value. Protein classification is a problem of pattern recognition. The classification of protein sequences is an important tool in the annotation of structural and functional properties to newly discovered proteins. This protein super family classification is used in drug discovery, prediction of molecular functions and medical diagnosis. Many techniques can be implemented for classification tasks such as statistical techniques, decision trees, support vector machines and neural networks. In this work, feed forward neural networks approach is used. Neural networks have been chosen as technical tools for the protein sequence classification task because: The features that are extracted from protein sequences are distributed in a high dimensional space and they have got complex characteristics which make it difficult to satisfactorily model using some parameterized approaches; and the rules produced by decision tree techniques are complex and difficult to understand because the features are extracted from long character strings. In this work, a comparative study of training feed forward neural network using the three algorithms – Back propagation Algorithm, Levenberg marquardt Algorithm and Back propagation Algorithm with genetic algorithm as optimiser is done. The efficiency of the three algorithms is measured in terms of convergence rate and performance accuracy. Keywords: ANN (Artificial neural network), Back propagation algorithm, Levenberg marquardt algorithm, Genetic algorithm

ethesis@nitr

Convolutional Neural Network-Based Artificial Intelligence for Classification of Protein Localization Patterns

Author: Huttunen Riku
Latonen Leena
Liimatainen Kaisa
Ruusuvuori Pekka
Publication venue: 'MDPI AG'
Publication date: 27/10/2022
Field of study

Identifying localization of proteins and their specific subpopulations associated with certain cellular compartments is crucial for understanding protein function and interactions with other macromolecules. Fluorescence microscopy is a powerful method to assess protein localizations, with increasing demand of automated high throughput analysis methods to supplement the technical advancements in high throughput imaging. Here, we study the applicability of deep neural network-based artificial intelligence in classification of protein localization in 13 cellular subcompartments. We use deep learning-based on convolutional neural network and fully convolutional network with similar architectures for the classification task, aiming at achieving accurate classification, but importantly, also comparison of the networks. Our results show that both types of convolutional neural networks perform well in protein localization classification tasks for major cellular organelles. Yet, in this study, the fully convolutional network outperforms the convolutional neural network in classification of images with multiple simultaneous protein localizations. We find that the fully convolutional network, using output visualizing the identified localizations, is a very useful tool for systematic protein localization assessment

UTUPub

Protein Sequences Classification Using Modular RBF Neural Networks

Author: C. H. Wu
C. H. Wu
C. H. Wu
C. M. Bishop
D. M.J. Tax
E.A. Ferran
H. C. Wang
J. Moody
S. Lawrence
Y. S. Hwang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

A protein super-family consists of proteins which share amino acid sequence homology and which may therefore be functionally and structurally related. One of the benefits from this category grouping is that some hint of function may be deduced for individual members from information on other members of the family. Traditionally, two protein sequences are classified into the same class if they have high homology in terms of feature patterns extracted through sequence alignment algorithms. These algorithms compare an unseen protein sequence with all the identified protein sequences and returned the higher scored protein sequences. As the sizes of the protein sequence databases are very large, it is a very time consuming job to perform exhaustive comparison of existing protein sequence. Therefore, there is a need to build an improved classification system for effectively identifying protein sequences. This paper presents a modular neural classifier for protein sequences with improved classification criteria. The intelligent classification techniques described in this paper aims to enhance the performance of single neural classifiers based on a centralized information structure in terms of recognition rate, generalization and reliability. The architecture of the proposed model is a modular RBF neural network with a compensational combination at the transition output layer. The connection weights between the final output layer and the transition output layer are optimized by delta rule, which serve as an integrator of the local neural classifiers. To enhance the classification reliability, we present two heuristic rules to apply to decision-making. Two sets of protein sequences with ten classes of superfamilies downloaded from a public domain database, Protein Information Resources (PIR), are used in our simulation study. Experimental results with performance comparisons are carried out between single neural classifiers and the proposed modular neural classifier

Crossref

Unimas Institutional Repository