1,488 research outputs found

    Predicting protein-protein interactions as a one-class classification problem

    Get PDF
    Protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been used to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. While it is easy to get a dataset of interacting protein as positive example, there is no experimentally confirmed non-interacting protein to be considered as a negative set. Therefore, in this paper we solve this problem as a one-class classification problem using One-Class SVM (OCSVM). Using only positive examples (interacting protein pairs) for training, the OCSVM achieves accuracy of 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with reliable accuracy

    Exploring the potential of 3D Zernike descriptors and SVM for protein\u2013protein interface prediction

    Get PDF
    Abstract Background The correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. Results In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). Conclusions The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class

    Discriminating between surfaces of peripheral membrane proteins and reference proteins using machine learning algorithms

    Get PDF
    In biology, the cell membrane is an important component of a cell and usually works as a “fence” to distinguish the inside and outside of a cell. The key role is to protect the cells from being interfered by their surroundings by preventing the molecules that will enter into the cell. However as we know, cells need to keep communicating with their surroundings to acquire nutrition and other necessary molecules in order to stay alive and grow. Due to this reason, membrane proteins are used as molecular carriers to participate the molecular communication and regulate the biological activities. There are two kinds of membrane proteins: integral and peripheral. In this project, we only focus on the latter. Unlike the integral membrane proteins which can go across the whole membrane, peripheral membrane proteins can only attach to the surface of the membrane through various interactions. Because peripheral proteins are also soluble, it is difficult to differentiate them from other kinds of proteins (i.e. non membrane-binding) from sequence or structure. In this project, we will develop a method to predict from its structure wether a protein is membrane-binding protein or not based on two machine learning algorithms: k-nearest neighbors(KNN) and support vector machine(SVM). We use them to train the data and create two models respectively, which will be used to classify new proteins as well as compare their performance. By for example collecting different features of proteins, adjusting the parameters of the algorithms or changing size and structure of the dataset, we can improve the performances of the algorithms as well as predict the protein type more accurately. We also use ROC curve and AUC to present the performance in overview, and cross validation to verify the result. For the problems in this field, several challenges should be considered as well, such as collecting of features, analysis and dealing with the huge variety of data, as well as the choice of machine learning algorithms for a design based on functional requirements, data structure, efficiency and other factors. In this project, we will encounter these challenges and solve them by effective methods.Master's Thesis in InformaticsINF39

    Deep Topology Classification: A New Approach for Massive Graph Classification

    Get PDF
    The classification of graphs is a key challenge within many scientific fields using graphs to represent data and is an active area of research. Graph classification can be critical in identifying and labelling unknown graphs within a dataset and has seen application across many scientific fields. Graph classification poses two distinct problems: the classification of elements within a graph and the classification of the entire graph. Whilst there is considerable work on the first problem, the efficient and accurate classification of massive graphs into one or more classes has, thus far, received less attention. In this paper we propose the Deep Topology Classification (DTC) approach for global graph classification. DTC extracts both global and vertex level topological features from a graph to create a highly discriminate representation in feature space. A deep feed-forward neural network is designed and trained to classify these graph feature vectors. This approach is shown to be over 99% accurate at discerning graph classes over two datasets. Additionally, it is shown to be more accurate than current state of the art approaches both in binary and multi-class graph classification tasks

    Prediction of transmembrane regions of β-barrel proteins using ANN- and SVM-based methods

    Get PDF
    This article describes a method developed for predicting transmembrane β-barrel regions in membrane proteins using machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). The ANN used in this study is a feed-forward neural network with a standard back-propagation training algorithm. The accuracy of the ANN-based method improved significantly, from 70.4% to 80.5%, when evolutionary information was added to a single sequence as a multiple sequence alignment obtained from PSI-BLAST. We have also developed an SVM-based method using a primary sequence as input and achieved an accuracy of 77.4%. The SVM model was modified by adding 36 physicochemical parameters to the amino acid sequence information. Finally, ANN- and SVM-based methods were combined to utilize the full potential of both techniques. The accuracy and Matthews correlation coefficient (MCC) value of SVM, ANN, and combined method are 78.5%, 80.5%, and 81.8%, and 0.55, 0.63, and 0.64, respectively. These methods were trained and tested on a nonredundant data set of 16 proteins, and performance was evaluated using "leave one out cross-validation" (LOOCV). Based on this study, we have developed a Web server, TBBPred, for predicting transmembrane β-barrel regions in proteins (available at http://www.imtech.res.in/raghava/tbbpred)

    Classification of glomerular hypercellularity using convolutional features and support vector machine

    Full text link
    Glomeruli are histological structures of the kidney cortex formed by interwoven blood capillaries, and are responsible for blood filtration. Glomerular lesions impair kidney filtration capability, leading to protein loss and metabolic waste retention. An example of lesion is the glomerular hypercellularity, which is characterized by an increase in the number of cell nuclei in different areas of the glomeruli. Glomerular hypercellularity is a frequent lesion present in different kidney diseases. Automatic detection of glomerular hypercellularity would accelerate the screening of scanned histological slides for the lesion, enhancing clinical diagnosis. Having this in mind, we propose a new approach for classification of hypercellularity in human kidney images. Our proposed method introduces a novel architecture of a convolutional neural network (CNN) along with a support vector machine, achieving near perfect average results with the FIOCRUZ data set in a binary classification (lesion or normal). Our deep-based classifier outperformed the state-of-the-art results on the same data set. Additionally, classification of hypercellularity sub-lesions was also performed, considering mesangial, endocapilar and both lesions; in this multi-classification task, our proposed method just failed in 4\% of the cases. To the best of our knowledge, this is the first study on deep learning over a data set of glomerular hypercellularity images of human kidney.Comment: 26 page
    • …
    corecore