Search CORE

46 research outputs found

Levelwise search of frequent patterns with counting inference

Author: Bastide Yves
Lakhal Lotfi
Pasquier Nicolas
Stumme Gerd
Taouil Rafik
Publication venue: HAL CCSD
Publication date: 01/10/2000
Field of study

Colloque avec actes et comité de lecture. nationale.National audienceIn this paper,we address the problem of the efficiency of the main phase of most data mining applications: The frequent pattern extraction. This problem is mainly related to the number of operations required for counting pattern supports in the database, and we propose a new method called pattern counting inference, that allows to perform as few support counts as possible. Using this method, the support of a pattern is determined without accessing the database whenever possible, using the supports of some of its sub-patterns called key patterns. This method was implemented in the Pascal algorithm that is an optimization of the simple and efficient Apriori Algorithm. Experiments comparing Pascal to the Apriori, Close and Max-Miner algorithms, each one representative of a frequent patterns discovery strategy, show that Pascal improves the efficiency of the frequent pattern extraction from correlated data and that it does not induce additional execution times when data is weakly correlated

INRIA a CCSD electronic archive server

CORON: A Framework for Levelwise Itemset Mining Algorithms

Author: Napoli Amedeo
Szathmary Laszlo
Publication venue: HAL CCSD
Publication date: 01/02/2005
Field of study

CORON is a framework for levelwise algorithms that are designed to find frequent and/or frequent closed itemsets in binary contexts. Datasets can be very different in size, number of objects, number of attributes, density, etc. As there is no one best algorithm for arbitrary datasets, we want to give a possibility for users to try different algorithms and choose the one that best suits their needs

INRIA a CCSD electronic archive server

An Efficient Hybrid Algorithm for Mining Frequent Closures and Generators

Author: Godin Robert
Napoli Amedeo
Szathmary Laszlo
Valtchev Petko
Publication venue: HAL CCSD
Publication date: 01/10/2008
Field of study

Conference site: http://cla2008.inf.upol.cz/ .International audienceThe effective construction of many association rule bases requires the computation of both frequent closed and frequent generator itemsets (FCIs/FGs). However, these two tasks are rarely combined. Most of the existing solutions apply levelwise breadth-first traversal, though depth-first traversal, depending on data characteristics, is often superior. Hence, we address here a hybrid algorithm that combines the two different traversals. The proposed algorithm, Eclat-Z, extracts frequent itemsets (FIs) in a depth-first way. Then, the algorithm filters FCIs and FGs among FIs in a levelwise manner, and associates the generators to their closures. In Eclat-Z we present a generic technique for extending an arbitrary FI-miner algorithm in order to support the generation of minimal non-redundant association rules too. Experimental results indicate that Eclat-Z outperforms pure levelwise methods in most cases

INRIA a CCSD electronic archive server

Pascal : un algorithme d'extraction des motifs fréquents

Author: Bastide Yves
Lakhal Lotfi
Pasquier Nicolas
Stumme Gerd
Taouil Rafik
Publication venue: 'Lavoisier'
Publication date: 28/03/2002
Field of study

International audienceNous proposons dans cet article l'algorithme Pascal qui introduit une nouvelle optimisation de l'algorithme de référence Apriori. Cette optimisation est fondée sur le comptage des motifs par inférence, qui utilise le concept de motifs clés. Le support des motifs fréquents non clés peut être inféré du support des motifs clés sans accès à la base de données. Expérimentalement, la comparaison de Pascal avec Apriori, Close et Max-Miner montre son efficacité. Les motifs clés permettent aussi de définir les règles d'association informatives, potentiellement plus utiles que l'ensemble complet des règles d'association et beaucoup moins nombreuses

HAL-UNICE

HAL AMU

HAL Clermont Université

HAL Université de Tours

An Experiment on Mining Chemical Reaction Databases

Author: Berasaluce Sandra
Laurenço Claude
Napoli Amedeo
Niel Gilles
Publication venue: Hermes Science Publishing, London
Publication date: 01/01/2004
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceIn this paper, we present an experiment on knowledge discovery in chemical reaction databases. Chemical reactions are the main elements on which relies synthesis in organic chemistry, and this is why chemical reactions databases are of first importance. From a problem-solving process perspective, synthesis in organic chemistry must be considered at several levels of abstraction: mainly a strategic level where general synthesis methods are involved, and a tactic level where actual chemical reactions are applied. The research work presented in this paper is aimed at discovering general synthesis methods from chemical reaction databases in order to design generic and reusable synthesis plans. The knowledge discovery process relies on frequent levelwise itemset search and association rule extraction, but also on chemical knowledge involved within every step of the knowledge discovery process. Moreover, the overall process is supervised by an expert of the domain

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Efficient Mining of Frequent Closures with Precedence Links and Associated Generators

Author: Napoli Amedeo
Szathmary Laszlo
Valtchev Petko
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

The effective construction of many association rule bases require the computation of frequent closures, generators, and precedence links between closures. However, these tasks are rarely combined, and no scalable algorithm exists at present for their joint computation. We propose here a method that solves this challenging problem in two separated steps. First, we introduce a new algorithm called Touch for finding frequent closed itemsets (FCIs) and their generators (FGs). Touch applies depth-first traversal, and experimental results indicate that this algorithm is highly efficient and outperforms its levelwise competitors. Second, we propose another algorithm called Snow for extracting efficiently the precedence from the output of Touch. To do so, we apply hypergraph theory. Snow is a generic algorithm that can be used with any FCI/FG-miner. The two algorithms, Touch and Snow, provide a complete solution for constructing iceberg lattices. Furthermore, due to their modular design, parts of the algorithms can also be used independently

INRIA a CCSD electronic archive server

On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

Author: Amarilli Antoine
Amsterdamer Yael
Milo Tova
Publication venue
Publication date: 16/12/2013
Field of study

We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

arXiv.org e-Print Archive

CiteSeerX

Mining Posets from Linear Orders

Author: Fernandez Proceso L.
Heath Lenwood S.
Ramakrishnan Naren
Vergara John Paul C.
Publication venue
Publication date: 01/01/2009
Field of study

There has been much research on the combinatorial problem of generating the linear extensions of a given poset. This paper focuses on the reverse of that problem, where the input is a set of linear orders, and the goal is to construct a poset or set of posets that generates the input. Such a problem ﬁnds applications in computational neuroscience, systems biology, paleontology, and physical plant engineering. In this paper, several algorithms are presented for efficiently ﬁnding a single poset that generates the input set of linear orders. The variation of the problem where a minimum set of posets that cover the input is also explored. It is found that the problem is polynomially solvable for one class of simple posets (kite(2) posets) but NP-complete for a related class (hammock(2,2,2) posets)

Computer Science Technical Reports @Virginia Tech