46 research outputs found

    Levelwise search of frequent patterns with counting inference

    Get PDF
    Colloque avec actes et comité de lecture. nationale.National audienceIn this paper,we address the problem of the efficiency of the main phase of most data mining applications: The frequent pattern extraction. This problem is mainly related to the number of operations required for counting pattern supports in the database, and we propose a new method called pattern counting inference, that allows to perform as few support counts as possible. Using this method, the support of a pattern is determined without accessing the database whenever possible, using the supports of some of its sub-patterns called key patterns. This method was implemented in the Pascal algorithm that is an optimization of the simple and efficient Apriori Algorithm. Experiments comparing Pascal to the Apriori, Close and Max-Miner algorithms, each one representative of a frequent patterns discovery strategy, show that Pascal improves the efficiency of the frequent pattern extraction from correlated data and that it does not induce additional execution times when data is weakly correlated

    CORON: A Framework for Levelwise Itemset Mining Algorithms

    Get PDF
    CORON is a framework for levelwise algorithms that are designed to find frequent and/or frequent closed itemsets in binary contexts. Datasets can be very different in size, number of objects, number of attributes, density, etc. As there is no one best algorithm for arbitrary datasets, we want to give a possibility for users to try different algorithms and choose the one that best suits their needs

    An Efficient Hybrid Algorithm for Mining Frequent Closures and Generators

    Get PDF
    Conference site: http://cla2008.inf.upol.cz/ .International audienceThe effective construction of many association rule bases requires the computation of both frequent closed and frequent generator itemsets (FCIs/FGs). However, these two tasks are rarely combined. Most of the existing solutions apply levelwise breadth-first traversal, though depth-first traversal, depending on data characteristics, is often superior. Hence, we address here a hybrid algorithm that combines the two different traversals. The proposed algorithm, Eclat-Z, extracts frequent itemsets (FIs) in a depth-first way. Then, the algorithm filters FCIs and FGs among FIs in a levelwise manner, and associates the generators to their closures. In Eclat-Z we present a generic technique for extending an arbitrary FI-miner algorithm in order to support the generation of minimal non-redundant association rules too. Experimental results indicate that Eclat-Z outperforms pure levelwise methods in most cases

    Pascal : un algorithme d'extraction des motifs fréquents

    Get PDF
    International audienceNous proposons dans cet article l'algorithme Pascal qui introduit une nouvelle optimisation de l'algorithme de référence Apriori. Cette optimisation est fondée sur le comptage des motifs par inférence, qui utilise le concept de motifs clés. Le support des motifs fréquents non clés peut être inféré du support des motifs clés sans accès à la base de données. Expérimentalement, la comparaison de Pascal avec Apriori, Close et Max-Miner montre son efficacité. Les motifs clés permettent aussi de définir les règles d'association informatives, potentiellement plus utiles que l'ensemble complet des règles d'association et beaucoup moins nombreuses

    An Experiment on Mining Chemical Reaction Databases

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceIn this paper, we present an experiment on knowledge discovery in chemical reaction databases. Chemical reactions are the main elements on which relies synthesis in organic chemistry, and this is why chemical reactions databases are of first importance. From a problem-solving process perspective, synthesis in organic chemistry must be considered at several levels of abstraction: mainly a strategic level where general synthesis methods are involved, and a tactic level where actual chemical reactions are applied. The research work presented in this paper is aimed at discovering general synthesis methods from chemical reaction databases in order to design generic and reusable synthesis plans. The knowledge discovery process relies on frequent levelwise itemset search and association rule extraction, but also on chemical knowledge involved within every step of the knowledge discovery process. Moreover, the overall process is supervised by an expert of the domain

    Efficient Mining of Frequent Closures with Precedence Links and Associated Generators

    Get PDF
    The effective construction of many association rule bases require the computation of frequent closures, generators, and precedence links between closures. However, these tasks are rarely combined, and no scalable algorithm exists at present for their joint computation. We propose here a method that solves this challenging problem in two separated steps. First, we introduce a new algorithm called Touch for finding frequent closed itemsets (FCIs) and their generators (FGs). Touch applies depth-first traversal, and experimental results indicate that this algorithm is highly efficient and outperforms its levelwise competitors. Second, we propose another algorithm called Snow for extracting efficiently the precedence from the output of Touch. To do so, we apply hypergraph theory. Snow is a generic algorithm that can be used with any FCI/FG-miner. The two algorithms, Touch and Snow, provide a complete solution for constructing iceberg lattices. Furthermore, due to their modular design, parts of the algorithms can also be used independently

    On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

    Full text link
    We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

    Mining Posets from Linear Orders

    Get PDF
    There has been much research on the combinatorial problem of generating the linear extensions of a given poset. This paper focuses on the reverse of that problem, where the input is a set of linear orders, and the goal is to construct a poset or set of posets that generates the input. Such a problem finds applications in computational neuroscience, systems biology, paleontology, and physical plant engineering. In this paper, several algorithms are presented for efficiently finding a single poset that generates the input set of linear orders. The variation of the problem where a minimum set of posets that cover the input is also explored. It is found that the problem is polynomially solvable for one class of simple posets (kite(2) posets) but NP-complete for a related class (hammock(2,2,2) posets)
    corecore