282 research outputs found

    Evolving temporal association rules with genetic algorithms

    Get PDF
    A novel framework for mining temporal association rules by discovering itemsets with a genetic algorithm is introduced. Metaheuristics have been applied to association rule mining, we show the efficacy of extending this to another variant - temporal association rule mining. Our framework is an enhancement to existing temporal association rule mining methods as it employs a genetic algorithm to simultaneously search the rule space and temporal space. A methodology for validating the ability of the proposed framework isolates target temporal itemsets in synthetic datasets. The Iterative Rule Learning method successfully discovers these targets in datasets with varying levels of difficulty

    HASAR : mining sequential association rules for atherosclerosis risk factor analysis

    Get PDF
    International audienceWe present the HASAR method that is an hybrid approach for ex- tracting adaptive sequential association rules. This method extracts association rules between events occurring in subsequent time-intervals using closed itemsets extraction and evolutionary techniques. An important feature is its capacity to consider different time-intervals depending on the attributes semantic. We applied this method for the analysis of long term medical observations of atherosclerosis risk factors for cardio-vascular diseases prevention. Experimental results show that it is well-suited for extracting knowledge from temporal data where interesting patterns have different observation period length.Nous présentons la méthode HASAR qui est une approche hybride pour l’extraction de règles d’assocations séquentilles. Cette méthode extrait des règles association entre des évènements ayant lieu à différent moment en extrayant des itemsets fermés fréquents et en utilisant des techniques évolutionnaires. Une fonctionnalité important de notre méthode est sa capacité à s’adapter à différente échelle de temps en fonction de la sémantique des attributs. Nous avons appliqué cette méthode pour analyser des observations médicales à long terme sur les facteurs de risques de l’artherosclerose. Les résultats expérimentaux ont montré que cette méthode est bien adaptée pour extraire des connaissance à partir de données temporelles ou les motifs intéressants doivent être observés sur différentes périodes de temps

    A scalable algorithm for the market basket analysis

    Get PDF
    The market basket is defined as an itemset bought together by a customer on a single visit to a store. The market basket analysis is a powerful tool for the implementation of cross-selling strategies. Especially in retailing it is essential to discover large baskets, since it deals with thousands of items. Although some algorithms can find large itemsets, they can be inefficient in terms of computational time. The aim of this paper is to present an algorithm to discover large itemset patterns for the market basket analysis. In this approach, the condensed data is used and is obtained by transforming the market basket problem into a maximum-weighted clique problem. Firstly, the input dataset is transformed into a graph-based structure and then the maximum-weighted clique problem is solved using a meta-heuristic approach in order to find the most frequent itemsets. The computational results show large itemset patterns with good scalability properties

    Application of data mining techniques in bioinformatics

    Get PDF
    With the widespread use of databases and the explosive growth in their sizes, there is a need to effectively utilize these massive volumes of data. This is where data mining comes in handy, as it scours the databases for extracting hidden patterns, finding hidden information, decision making and hypothesis testing. Bioinformatics, an upcoming field in today’s world, which involves use of large databases can be effectively searched through data mining techniques to derive useful rules. Based on the type of knowledge that is mined, data mining techniques [1] can be mainly classified into association rules, decision trees and clustering. Until recently, biology lacked the tools to analyze massive repositories of information such as the human genome database [3]. The data mining techniques are effectively used to extract meaningful relationships from these data.Data mining is especially used in microarray analysis which is used to study the activity of different cells under different conditions. Two algorithms under each mining techniques were implemented for a large database and compared with each other. 1. Association Rule Mining: - (a) a priori (b) partition 2. Clustering: - (a) k-means (b) k-medoids 3. Classification Rule Mining:- Decision tree generation using (a) gini index (b) entropy value. Genetic algorithms were applied to association and classification techniques. Further, kmeans and Density Based Spatial Clustering of Applications of Noise (DBSCAN) clustering techniques [1] were applied to a microarray dataset and compared. The microarray dataset was downloaded from internet using the Gene Array Analyzer Software(GAAS).The clustering was done on the basis of the signal color intensity of the genes in the microarray experiment. The following results were obtained:- 1. Association:- For smaller databases, the a priori algorithm works better than partition algorithm and for larger databases partition works better. 2. Clustering:- With respect to the number of interchanges, k-medoids algorithm works better than k-means algorithm. 3. Classification:- The results were similar for both the indices (gini index and entropy value). The application of genetic algorithm improved the efficiency of the association and classification techniques. For the microarray dataset, it was found that DBSCAN is less efficient than k-means when the database is small but for larger database DBSCAN is more accurate and efficient in terms of no. of clusters and time of execution. DBSCAN execution time increases linearly with the increase in database and was much lesser than that of k-means for larger database. Owing to the involvement of large datasets and the need to derive results from them, data mining techniques can be effectively put in use in the field of Bio-informatics [2]. The techniques can be applied to find associations among the genes, cluster similar gene and protein sequences and draw decision trees to classify the genes. Further, the data mining techniques can be made more efficient by applying genetic algorithms which greatly improves the search procedure and reduces the execution time

    Web Usage Mining with Evolutionary Extraction of Temporal Fuzzy Association Rules

    Get PDF
    In Web usage mining, fuzzy association rules that have a temporal property can provide useful knowledge about when associations occur. However, there is a problem with traditional temporal fuzzy association rule mining algorithms. Some rules occur at the intersection of fuzzy sets' boundaries where there is less support (lower membership), so the rules are lost. A genetic algorithm (GA)-based solution is described that uses the flexible nature of the 2-tuple linguistic representation to discover rules that occur at the intersection of fuzzy set boundaries. The GA-based approach is enhanced from previous work by including a graph representation and an improved fitness function. A comparison of the GA-based approach with a traditional approach on real-world Web log data discovered rules that were lost with the traditional approach. The GA-based approach is recommended as complementary to existing algorithms, because it discovers extra rules. (C) 2013 Elsevier B.V. All rights reserved
    corecore