Search CORE

92 research outputs found

Query Rewriting in Itemset Mining

Author: Botta Marco
Esposito Roberto
Meo Rosa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Abstract. In recent years, researchers have begun to study inductive databases, a new generation of databases for leveraging decision support applications. In this context, the user interacts with the DBMS using advanced, constraint-based languages for data mining where constraints have been specifically introduced to increase the relevance of the results and, at the same time, to reduce its volume. In this paper we study the problem of mining frequent itemsets using an inductive database 1 . We propose a technique for query answering which consists in rewriting the query in terms of union and intersection of the result sets of other queries, previously executed and materialized. Unfortunately, the exploitation of past queries is not always applicable. We then present sufficient conditions for the optimization to apply and show that these conditions are strictly connected with the presence of functional dependencies between the attributes involved in the queries. We show some experiments on an initial prototype of an optimizer which demonstrates that this approach to query answering is not only viable but in many practical cases absolutely necessary since it reduces drastically the execution time

CiteSeerX

Institutional Research Information System University of Turin

Using and extending itemsets in data mining : query approximation, dense itemsets, and tiles

Author: Seppänen Jouni K.
Publication venue: Teknillinen korkeakoulu
Publication date: 31/05/2006
Field of study

Frequent itemsets are one of the best known concepts in data mining, and there is active research in itemset mining algorithms. An itemset is frequent in a database if its items co-occur in sufficiently many records. This thesis addresses two questions related to frequent itemsets. The first question is raised by a method for approximating logical queries by an inclusion-exclusion sum truncated to the terms corresponding to the frequent itemsets: how good are the approximations thereby obtained? The answer is twofold: in theory, the worst-case bound for the algorithm is very large, and a construction is given that shows the bound to be tight; but in practice, the approximations tend to be much closer to the correct answer than in the worst case. While some other algorithms based on frequent itemsets yield even better approximations, they are not as widely applicable. The second question concerns extending the definition of frequent itemsets to relax the requirement of perfect co-occurrence: highly correlated items may form an interesting set, even if they never co-occur in a single record. The problem is to formalize this idea in a way that still admits efficient mining algorithms. Two different approaches are used. First, dense itemsets are defined in a manner similar to the usual frequent itemsets and can be found using a modification of the original itemset mining algorithm. Second, tiles are defined in a different way so as to form a model for the whole data, unlike frequent and dense itemsets. A heuristic algorithm based on spectral properties of the data is given and some of its properties are explored.Yksi tiedon louhinnan tunnetuimmista käsitteistä ovat kattavat joukot, ja niiden etsintäalgoritmeja tutkitaan aktiivisesti. Joukko on tietokannassa kattava, jos sen alkiot esiintyvät yhdessä riittävän monessa tietueessa. Väitöskirjassa käsitellään kahta kattaviin joukkoihin liittyvää kysymystä. Ensimmäinen liittyy algoritmiin, jolla arvioidaan loogisten kyselyjen tuloksia laskemalla inkluusio-ekskluusio-summa pelkästään kattavilla joukoilla; kysymys on, kuinka hyviä arvioita näin saadaan. Väitöskirjassa annetaan kaksi vastausta: Teoriassa algoritmin pahimman tapauksen raja on hyvin suuri, ja vastaesimerkillä osoitetaan, että raja on tiukka. Käytännössä arviot ovat paljon lähempänä oikeaa tulosta kuin teoreettinen raja antaa ymmärtää. Arvioita vertaillaan eräisiin muihin algoritmeihin, joiden tulokset ovat vielä parempia mutta jotka eivät ole yhtä yleisesti sovellettavissa. Toinen kysymys koskee kattavien joukkojen määritelmän yleistämistä siten, että täydellisen yhteisesiintymisen vaatimuksesta tingitään. Joukko korreloituneita alkioita voi olla kiinnostava, vaikka alkiot eivät koskaan esiintyisi kaikki samassa tietueessa. Ongelma on tämän ajatuksen muuttaminen sellaiseksi määritelmäksi, että tehokkaita louhinta-algoritmeja voidaan käyttää. Väitöskirjassa esitetään kaksi lähestymistapaa. Ensinnäkin tiheät kattavat joukot määritellään samanlaiseen tapaan kuin tavalliset kattavat joukot, ja ne voidaan löytää samantyyppisellä algoritmilla. Toiseksi määritellään laatat, jotka muodostavat koko datalle mallin, toisin kuin kattavat ja tiheät kattavat joukot. Laattojen etsimistä varten kuvataan datan spektraalisiin ominaisuuksiin perustuva heuristiikka, jonka eräitä ominaisuuksia tutkitaan.reviewe

Aaltodoc Publication Archive

Automatically extracting news articles from the Internet

Author: Jasselette Arnaud
Vanderwhale Mathieu
Publication venue
Publication date: 01/01/2005
Field of study

Repository of the University of Namur

Définition des familles de produits à l'aide de la logique floue

Author: Barajas Vazques Marco Antonio
Publication venue
Publication date: 01/12/2009
Field of study

Dans cette thèse, la contribution principale porte sur la conception des familles de produits par l'application de la logique floue, ceci afin d’améliorer le processus de prise de décisions. Nous considérons que la formation des familles de produits, permet aux entreprises d'offrir une grande variété de produits. Cela permet alors de satisfaire une grande variété de différents types de clients sur un marché cible, et d’éviter une diversification coûteuse par la conception et la fabrication de produits personnalisés pour chaque client. La logique floue permet d’entrer l'information à fournir en des termes linguistiques familièrement exprimés par les personnes. C’est-à-dire qu’elle permet de considérer une information plus conforme à celle exprimée par les consommateurs; elle n'est pas limitée au maniement de variables binaires comme la logique booléenne. La logique floue à travers la formulation de différentes fonctions d'appartenance, est capable d'évaluer une variété de réponses pour une variable et pas seulement un «oui» ou un «non». Après l'analyse de littérature en ce qui concerne la logique floue et le développement des familles de produits. Nous concluons que le processus de prise de décisions est fondamental pour une formation effective des familles de produits et que le classement flou représente la base des processus de prise de décisions aidés par la logique floue. Pour cela, dans ce travail, différents outils assistés par la logique floue ont été développés et appliqués en cherchant à atteindre l'objectif principal. Premièrement, une procédure de classement flou a été améliorée pour permettre d'évaluer les relations de préférences entre plusieurs nombres flous avec différentes fonctions d’appartenance. L’amélioration de cette procédure a été la définition de vingt-neuf cas généraux pour représenter les différentes situations qui peuvent se présenter entre deux nombres flous. Ces cas généraux ont été aussi présentés comme un cadre de référence qui permet d'inclure d'autres fonctions d’appartenance. Postérieurement, en ce qui concerne la conception de familles de produits, différents outils ont été développés, appliqués et finalement intégrés dans une méthodologie globale pour la formation de familles de produits.----------ABSTRACT In this thesis, the main contribution is concerned to the design of product families by applying fuzzy logic, in order to improve the decision making process. We consider that the formation of product families enables companies to offer a wide variety of products allowing the satisfaction of different types of customers into the target market, and avoiding a costly diversification by designing customized products for each customer. Fuzzy logic allows entering information provided in linguistic terms familiarly expressed by the people. That is to say, it allows considering more consistent information close to the expressed by customers and it is not limited to handle binary variables as the Boolean logic. Fuzzy logic through the formulation of different membership functions can evaluate more answers of a variable instead of a just a “yes” or a “not”. After carrying out the literature review, regarding to the fuzzy logic and to the product family development. We concluded that the process of decision making is fundamental for the effectively formation of families of products, and that the fuzzy ranking is the basis of such process. In this work, various fuzzy logic-aided tools have been developed and applied aiming at achieving the main objective. First, an improved fuzzy ranking procedure for decision making in product has been proposed to permit the evaluation of the fuzzy preference relations among several fuzzy numbers with different membership functions. This fuzzy ranking procedure has been supported by the definition of twenty-nine general cases, which is enough to consider all the possible situations between two normal fuzzy numbers. These general cases have been presented as a framework to facilitate the inclusion of other membership functions. Later, regarding the design of product families, different tools have been developed, implemented, and integrated into a global methodology to form families of products. These tools include: a ranking procedure for fuzzy decision-making in product design to compare different products, a method to select products based on the fuzzy preferences of the customers, an iterative method to configure products for specific customers, a method to configure different products to satisfy the different segments of the market, and finally the integration of all these tools in a global methodology for designing families of products by using fuzzy logic

PolyPublie