8 research outputs found

    pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties

    Get PDF
    BACKGROUND: Protein subcellular localization is an important determinant of protein function and hence, reliable methods for prediction of localization are needed. A number of prediction algorithms have been developed based on amino acid compositions or on the N-terminal characteristics (signal peptides) of proteins. However, such approaches lead to a loss of contextual information. Moreover, where information about the physicochemical properties of amino acids has been used, the methods employed to exploit that information are less than optimal and could use the information more effectively. RESULTS: In this paper, we propose a new algorithm called pSLIP which uses Support Vector Machines (SVMs) in conjunction with multiple physicochemical properties of amino acids to predict protein subcellular localization in eukaryotes across six different locations, namely, chloroplast, cytoplasmic, extracellular, mitochondrial, nuclear and plasma membrane. The algorithm was applied to the dataset provided by Park and Kanehisa and we obtained prediction accuracies for the different classes ranging from 87.7% – 97.0% with an overall accuracy of 93.1%. CONCLUSION: This study presents a physicochemical property based protein localization prediction algorithm. Unlike other algorithms, contextual information is preserved by dividing the protein sequences into clusters. The prediction accuracy shows an improvement over other algorithms based on various types of amino acid composition (single, pair and gapped pair). We have also implemented a web server to predict protein localization across the six classes (available at )

    Model-based classification for subcellular localization prediction of proteins

    Get PDF

    A branching fuzzy-logic classifier for building optimization

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2005.Includes bibliographical references (p. 109-110).We present an input-output model that learns to emulate a complex building simulation of high dimensionality. Many multi-dimensional systems are dominated by the behavior of a small number of inputs over a limited range of input variation. Some also exhibit a tendency to respond relatively strongly to certain inputs over small ranges, and to other inputs over very large ranges of input variation. A branching linear discriminant can be used to isolate regions of local linearity in the input space, while also capturing the effects of scale. The quality of the classification may be improved by using a fuzzy preference relation to classify input configurations that are not well handled by the linear discriminant.by Matthew A. Lehar.Ph.D

    Probabilistic models for mining imbalanced relational data

    Get PDF
    Most data mining and pattern recognition techniques are designed for learning from at data files with the assumption of equal populations per class. However, most real-world data are stored as rich relational databases that generally have imbalanced class distribution. For such domains, a rich relational technique is required to accurately model the different objects and relationships in the domain, which can not be easily represented as a set of simple attributes, and at the same time handle the imbalanced class problem.Motivated by the significance of mining imbalanced relational databases that represent the majority of real-world data, learning techniques for mining imbalanced relational domains are investigated. In this thesis, the employment of probabilistic models in mining relational databases is explored. In particular, the Probabilistic Relational Models (PRMs) that were proposed as an extension of the attribute-based Bayesian Networks. The effectiveness of PRMs in mining real-world databases was explored by learning PRMs from a real-world university relational database. A visual data mining tool is also proposed to aid the interpretation of the outcomes of the PRM learned models.Despite the effectiveness of PRMs in relational learning, the performance of PRMs as predictive models is significantly hindered by the imbalanced class problem. This is due to the fact that PRMs share the assumption common to other learning techniques of relatively balanced class distributions in the training data. Therefore, this thesis proposes a number of models utilizing the effectiveness of PRMs in relational learning and extending it for mining imbalanced relational domains.The first model introduced in this thesis examines the problem of mining imbalanced relational domains for a single two-class attribute. The model is proposed by enriching the PRM learning with the ensemble learning technique. The premise behind this model is that an ensemble of models would attain better performance than a single model, as misclassification committed by one of the models can be often correctly classified by others.Based on this approach, another model is introduced to address the problem of mining multiple imbalanced attributes, in which it is important to predict several attributes rather than a single one. In this model, the ensemble bagging sampling approach is exploited to attain a single model for mining several attributes. Finally, the thesis outlines the problem of imbalanced multi-class classification and introduces a generalized framework to handle this problem for both relational and non-relational domains

    Contribution à l'intégration des machines à vecteurs de support au sein des systèmes de reconnaisance de formes : application à la lecture automatique de l'écriture manuscrite

    Get PDF
    Durant ces dernières années, les machines à vecteurs de support (SVM) ont démontré maintes reprises leur supériorité en termes de généralisation. L'objectif de cette thèse de doctorat a alors consisté à isoler les principaux problèmes liés à l'intégration des SVM au sein de systèmes de reconnaissance de formes et notamment des systèmes de lecture automatique de l'écriture manuscrite et à y apporter des éléments de réponse. Nous nous sommes ainsi intéressés à la résolution de problèmes multi-classes, à l'estimation de probabilités a posteriori d'appartenance aux différentes classes, à l'accélération de la prise de décision et enfin à la combinaison avec une approche de classification agissant par modélisation de manière à pouvoir traiter efficacement à la fois les données ambiguës et les données aberrantes

    Combining Pairwise Classifiers with Stacking

    No full text
    Pairwise classification is the technique that deals with multi-class problems by converting them into a series of binary problems, one for each pair of classes. The predictions of the binary classifiers are typically combined into an overall prediction by voting and predicting the class that received the largest number of votes. In this paper we try to generalize the voting procedure by replacing it with a trainable classifier, i.e., we propose the use of a meta-level classifier that is trained to arbiter among the conflicting predictions of the binary classifiers
    corecore