345,161 research outputs found

    Prediction of protein-protein interaction types using association rule based classification

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund - Copyright @ 2009 Park et alBackground: Protein-protein interactions (PPI) can be classified according to their characteristics into, for example obligate or transient interactions. The identification and characterization of these PPI types may help in the functional annotation of new protein complexes and in the prediction of protein interaction partners by knowledge driven approaches. Results: This work addresses pattern discovery of the interaction sites for four different interaction types to characterize and uses them for the prediction of PPI types employing Association Rule Based Classification (ARBC) which includes association rule generation and posterior classification. We incorporated domain information from protein complexes in SCOP proteins and identified 354 domain-interaction sites. 14 interface properties were calculated from amino acid and secondary structure composition and then used to generate a set of association rules characterizing these domain-interaction sites employing the APRIORI algorithm. Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models. We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content. Conclusion: The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules. A web application based on our method can be found at http://bioinfo.ssu.ac.kr/~shpark/picasso/SHP was supported by the Korea Research Foundation Grant funded by the Korean Government(KRF-2005-214-E00050). JAR has been supported by the Programme Alβan, the European Union Programme of High level Scholarships for Latin America, scholarship E04D034854CL. SK was supported by Soongsil University Research Fund

    Continuous Iterative Guided Spectral Class Rejection Classification Algorithm: Part 1

    Get PDF
    This paper outlines the changes necessary to convert the iterative guided spectral class rejection (IGSCR) classification algorithm to a soft classification algorithm. IGSCR uses a hypothesis test to select clusters to use in classification and iteratively refines clusters not yet selected for classification. Both steps assume that cluster and class memberships are crisp (either zero or one). In order to make soft cluster and class assignments (between zero and one), a new hypothesis test and iterative refinement technique are introduced that are suitable for soft clusters. The new hypothesis test, called the (class) association significance test, is based on the normal distribution, and a proof is supplied to show that the assumption of normality is reasonable. Soft clusters are iteratively refined by creating new clusters using information contained in a targeted soft cluster. Soft cluster evaluation and refinement can then be combined to form a soft classification algorithm, continuous iterative guided spectral class rejection (CIGSCR)

    Mining Association Rules Based on Certainty

    Get PDF
    Abstract: The paper proposed a new kind of classification algorithm based on support and certainty, which scanned the same datasets several times to discover certain frequent item sets whose length complied with the fixed increment. The algorithm produced the Boolean association rules by means of the width preference-traversing mode. The experiment shows this algorithm of association rules based on certainty and support architecture could generate a accurate association rules compared with other classification algorithm and improve the accuracy and perceptiveness of association rules effectively

    Using fuzzy association rule mining in cancer classification

    Get PDF
    The classification of the cancer tumors based on gene expression profiles has been extensively studied in numbers of studies. A wide variety of cancer datasets have been implemented by the various methods of gene selection and classification to identify the behavior of the genes in tumors and find the relationships between them and outcome of diseases. Interpretability of the model, which is developed by fuzzy rules and linguistic variables in this study, has been rarely considered. In addition, creating a fuzzy classifier with high performance in classification that uses a subset of significant genes which have been selected by different types of gene selection methods is another goal of this study. A new algorithm has been developed to identify the fuzzy rules and significant genes based on fuzzy association rule mining. At first, different subset of genes which have been selected by different methods, were used to generate primary fuzzy classifiers separately and then proposed algorithm was implemented to mix the genes which have been associated in the primary classifiers and generate a new classifier. The results show that fuzzy classifier can classify the tumors with high performance while presenting the relationships between the genes by linguistic variables

    Diagnosis and Prognosis of Breast Cancer Using Multi Classification Algorithm

    Get PDF
    Data mining is the process of analysing data from different views points and condensing it into useful information. There are several types of algorithms in data mining such as Classification algorithms, Regression,Segmentation algorithms, Association algorithms, Sequence analysis algorithms, etc.,. The classification algorithm can be usedto bifurcate the data set from the given data set and foretell one or more discrete variables, based on the other attributes in the dataset. The ID3 (Iterative Dichotomiser 3) algorithm is an original data set S as the root node. An unutilised attribute of the data set S calculates the entropy H(S) (or Information gain IG (A)) of the attribute. Upon its selection, the attribute should have the smallest entropy (or largest information gain) value. A genetic algorithm (GA) is aheuristic quest that imitates the process of natural selection. Genetic algorithm can easily select cancer data set, from the given data set using GA operators, such as mutation, selection, and crossover. A method existed earlier (KNN+GA) was not successful for breast cancer and primary tumor. Our method of creating new algorithm GA+ID3 easily identifies breast cancer data set from the given data set. The multi classification algorithm diagnosis and prognosis of breast cancer data set is identified by this paper
    corecore