67,526 research outputs found

    Protein family classification using multiple-class neural networks.

    Get PDF
    The objective of genomic sequence analysis is to retrieve important information from the vast amount of genomic sequence data, such as DNA, RNA and protein sequences. The main task includes the interpretation of the function of DNA sequence on a genomic scale, the comparisons among genomes to gain insight into the universality of biological mechanisms and into the details of gene structure and function, the determination of the structure of all proteins and protein family classification. With its many features and capabilities for recognition, generalization and classification, artificial neural network technology is well suited for sequence analysis. At the state of the art, many methods have been devised to determine if a given protein sequence is member of a given protein superfamily. This is a binary classification problem, and efficient neural network techniques are mentioned in literature for solving such problem. In this Master\u27s thesis, we consider the problem of classifying given protein sequences into one among at least three protein families using neural networks, and, propose two methods: Pair-wise Multiple Classification Approach and Single Network Approach for this problem. In Pair-wise Multiple Classification Approach , several sub-networks are employed to perform the task whereas a compact network system is used in Single Network Approach . We performed experiments, using SNNS and UOWNNS neural network simulator on our NNs with different input/output representation, and reported accuracies as high as 95%. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z54. Source: Masters Abstracts International, Volume: 43-01, page: 0248. Adviser: Alioune Ngom. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    Nonlinear Models Using Dirichlet Process Mixtures

    Full text link
    We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component. We use simulated data to compare the performance of this new approach to a simple multinomial logit (MNL) model, an MNL model with quadratic terms, and a decision tree model. We also evaluate our approach on a protein fold classification problem, and find that our model provides substantial improvement over previous methods, which were based on Neural Networks (NN) and Support Vector Machines (SVM). Folding classes of protein have a hierarchical structure. We extend our method to classification problems where a class hierarchy is available. We find that using the prior information regarding the hierarchical structure of protein folds can result in higher predictive accuracy

    Protein Super family Classification using Artificial Neural Networks

    Get PDF
    Classification, or supervised learning, is one of the major data mining processes. Pattern recognition involves assigning a label to a given input value. Protein classification is a problem of pattern recognition. The classification of protein sequences is an important tool in the annotation of structural and functional properties to newly discovered proteins. This protein super family classification is used in drug discovery, prediction of molecular functions and medical diagnosis. Many techniques can be implemented for classification tasks such as statistical techniques, decision trees, support vector machines and neural networks. In this work, feed forward neural networks approach is used. Neural networks have been chosen as technical tools for the protein sequence classification task because: The features that are extracted from protein sequences are distributed in a high dimensional space and they have got complex characteristics which make it difficult to satisfactorily model using some parameterized approaches; and the rules produced by decision tree techniques are complex and difficult to understand because the features are extracted from long character strings. In this work, a comparative study of training feed forward neural network using the three algorithms – Back propagation Algorithm, Levenberg marquardt Algorithm and Back propagation Algorithm with genetic algorithm as optimiser is done. The efficiency of the three algorithms is measured in terms of convergence rate and performance accuracy. Keywords: ANN (Artificial neural network), Back propagation algorithm, Levenberg marquardt algorithm, Genetic algorithm

    Convolutional Neural Network-Based Artificial Intelligence for Classification of Protein Localization Patterns

    Get PDF
    Identifying localization of proteins and their specific subpopulations associated with certain cellular compartments is crucial for understanding protein function and interactions with other macromolecules. Fluorescence microscopy is a powerful method to assess protein localizations, with increasing demand of automated high throughput analysis methods to supplement the technical advancements in high throughput imaging. Here, we study the applicability of deep neural network-based artificial intelligence in classification of protein localization in 13 cellular subcompartments. We use deep learning-based on convolutional neural network and fully convolutional network with similar architectures for the classification task, aiming at achieving accurate classification, but importantly, also comparison of the networks. Our results show that both types of convolutional neural networks perform well in protein localization classification tasks for major cellular organelles. Yet, in this study, the fully convolutional network outperforms the convolutional neural network in classification of images with multiple simultaneous protein localizations. We find that the fully convolutional network, using output visualizing the identified localizations, is a very useful tool for systematic protein localization assessment

    Protein Sequences Classification Using Modular RBF Neural Networks

    Get PDF
    A protein super-family consists of proteins which share amino acid sequence homology and which may therefore be functionally and structurally related. One of the benefits from this category grouping is that some hint of function may be deduced for individual members from information on other members of the family. Traditionally, two protein sequences are classified into the same class if they have high homology in terms of feature patterns extracted through sequence alignment algorithms. These algorithms compare an unseen protein sequence with all the identified protein sequences and returned the higher scored protein sequences. As the sizes of the protein sequence databases are very large, it is a very time consuming job to perform exhaustive comparison of existing protein sequence. Therefore, there is a need to build an improved classification system for effectively identifying protein sequences. This paper presents a modular neural classifier for protein sequences with improved classification criteria. The intelligent classification techniques described in this paper aims to enhance the performance of single neural classifiers based on a centralized information structure in terms of recognition rate, generalization and reliability. The architecture of the proposed model is a modular RBF neural network with a compensational combination at the transition output layer. The connection weights between the final output layer and the transition output layer are optimized by delta rule, which serve as an integrator of the local neural classifiers. To enhance the classification reliability, we present two heuristic rules to apply to decision-making. Two sets of protein sequences with ten classes of superfamilies downloaded from a public domain database, Protein Information Resources (PIR), are used in our simulation study. Experimental results with performance comparisons are carried out between single neural classifiers and the proposed modular neural classifier
    corecore