85 research outputs found

    A fuzzy approach for mining quantitative association rules

    Get PDF
    During the last ten years, data mining, also known as knowledge discovery in databases, has established its position as a prominent and important research area. Mining association rules is one of the important research problems in data mining. Many algorithms have been proposed to find association rules in databases with quantitative attributes. The algorithms usually discretize the attribute domains into sharp intervals, and then apply simpler algorithms developed for boolean attributes. An example of a quantitative association rule might be "10% of married people between age 50 and 70 have at least 2 cars". Recently, fuzzy sets were suggested to represent intervals with non-sharp boundaries. Using the fuzzy concept, the above example could be rephrased e.g. "10% of married old people have several cars". However, if the fuzzy sets are not well chosen, anomalies may occur. In this paper we tackle this problem by introducing an additional fuzzy normalization process. Then we present the definition of quantitative association rules based on fuzzy set theory and propose a new algorithm for mining fuzzy association rules. The algorithm uses generalized definitions for interest measures. Experimental results show the efficiency of the algorithm for large databases

    Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data

    Get PDF
    During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in high-dimensional data. The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner. The proposed algorithm has been implemented in the commonly used MATLAB environment and freely available for researchers

    Mining co-regulated gene profiles for the detection of functional associations in gene expression data

    Get PDF
    Motivation: Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods. Results: We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques. Contact: [email protected] Supplementary information: Supplementary data and an executable demo program of the MAP implementation are freely available at http://www.fgcz.ch/publications/ma

    Age at Seroconversion, HLA Genotype, and Specificity of Autoantibodies in Progression of Islet Autoimmunity in Childhood

    Get PDF
    Context: Children with initial autoantibodies to either insulin (IAA) or glutamic acid decarboxylase (GADA) differ in peak age of seroconversion and have different type 1 diabetes (T1D) risk gene associations, suggesting heterogeneity in the disease process. Objective: To compare the associations of age at seroconversion, HLA risk, and specificity of secondary autoantibodies with the progression of islet autoimmunity between children with either IAA or GADA as their first autoantibody. Design and methods: A cohort of 15,253 children with HLA-associated increased risk of T1D participated in a follow-up program in which islet autoantibodies were regularly measured. The median follow-up time was 6.7 years. Spearman correlation, Kaplan-Meier survival plots, and Cox proportional-hazard models were used for statistical analyses. Results: Persistent positivity for at least one of the tested autoantibodies was detected in 998 children; 388 of children progressed to clinical T1D. Young age at initial seroconversion was associated with a high probability of expansion of IAA-initiated autoimmunity and progression to clinical diabetes, whereas expansion of GADA-initiated autoimmunity and progression to diabetes were not dependent on initial seroconversion age. The strength of HLA risk affected the progression of both IAA- and GADA-initiated autoimmunity. The simultaneous appearance of two other autoantibodies increased the rate of progression to diabetes compared with that of a single secondary autoantibody among subjects with GADA-initiated autoimmunity but not among those with IAA as the first autoantibody. Conclusions: Findings emphasize the differences in the course of islet autoimmunity initiated by either IAA or GADA supporting heterogeneity in the pathogenic process.Peer reviewe
    • …
    corecore