5 research outputs found

    On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

    Full text link
    We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

    Etude, représentation et applications des traverses minimales d'un hypergraphe

    Get PDF
    This work is part of the field of the hypergraph theory and focuses on hypergraph minimal transversal. The problem of extracting the minimal transversals from a hypergraph received the interest of many researchers as shown the number of algorithms proposed in the literature, and this is mainly due to the solutions offered by the minimal transversal in various application areas such as databases, artificial intelligence, e-commerce, semantic web, etc. In view of the wide range of fields of minimal transversal application and the interest they generate, the objective of this thesis is to explore new application paths of minimal transversal by proposing methods to optimize the extraction. This has led to three proposed contributions in this thesis. The first approach takes advantage of the emergence of Web 2.0 and, therefore, social networks using minimal transversal for the detection of important actors within these networks. The second part of research in this thesis has focused on reducing the number of hypergraph minimal transversal. A concise and accurate representation of minimal transversal was proposed and is based on the construction of an irredundant hypergraph, hence are calculated the irredundant minimal transversal of the initial hypergraph. An application of this representation to the dependency inference problem is presented to illustrate the usefulness of this approach. The last approach includes the hypergraph decomposition into partial hypergraph the “local” minimal transversal are calculated and their Cartesian product can generate all the hypergraph transversal sets. Different experimental studies have shown the value of these proposed approachesCette thèse s'inscrit dans le domaine de la théorie des hypergraphes et s'intéresse aux traverses minimales des hypergraphes. L'intérêt pour l'extraction des traverses minimales est en nette croissance, depuis plusieurs années, et ceci est principalement dû aux solutions qu'offrent les traverses minimales dans divers domaines d'application comme les bases de données, l'intelligence artificielle, l'e-commerce, le web sémantique, etc. Compte tenu donc du large éventail des domaines d'application des traverses minimales et de l'intérêt qu'elles suscitent, l'objectif de cette thèse est donc d'explorer de nouvelles pistes d'application des traverses minimales tout en proposant des méthodes pour optimiser leur extraction. Ceci a donné lieu à trois contributions proposées dans cette thèse. La première approche tend à tirer profit de l'émergence du Web 2.0 et, par conséquent, des réseaux sociaux en utilisant les traverses minimales pour la détection des acteurs importants au sein de ces réseaux. La deuxième partie de recherche au cours de cette thèse s'est intéressé à la réduction du nombre de traverses minimales d'un hypergraphe. Ce nombre étant très élevé, une représentation concise et exacte des traverses minimales a été proposée et est basée sur la construction d'un hypergraphe irrédondant, d'où sont calculées les traverses minimales irrédondantes de l'hypergraphe initial. Une application de cette représentation au problème de l'inférence des dépendances fonctionnelles a été présentée pour illustrer l’intérêt de cette approche. La dernière approche s'est intéressée à la décomposition des hypergraphes en des hypergraphes partiels. Les traverses minimales de ces derniers sont calculées et leur produit cartésien permet de générer l'ensemble des traverses de l'hypergraphe. Les différentes études expérimentales menées ont montré l’intérêt de ces approches proposée

    Frequent Itemset Border Approximation by Dualization

    No full text
    International audienceThe approach FIBAD is introduced with the purpose of computing approximate borders of frequent itemsets by leveraging dualization and computation of approximate minimal transversals of hypergraphs. The distinctiveness of the FIBAD's theoretical foundations s the approximate dualization where a new function ~f is defined to compute the approximate negative border. From a methodological point of view, the function ~f is implemented by the method AMTHR that consists of a reduction of the hypergraph and a computation of its minimal transversals. For evaluation purposes, we study the sensibility of FIBAD to AMTHR by replacing this latter by two other algorithms that compute approximate minimal transversals. We also compare our approximate dualization-based method with an existing approach that computes directly, without dualization, the approximate borders. The experimental results show that our method outperforms the other methods as it produces borders that have the highest quality
    corecore