327 research outputs found

    Optimal constraint-based decision tree induction from itemset lattices

    No full text
    International audienceIn this article we show that there is a strong connection between decision tree learning and local pattern mining. This connection allows us to solve the computationally hard problem of finding optimal decision trees in a wide range of applications by post-processing a set of patterns: we use local patterns to construct a global model. We exploit the connection between constraints in pattern mining and constraints in decision tree induction to develop a framework for categorizing decision tree mining constraints. This framework allows us to determine which model constraints can be pushed deeply into the pattern mining process, and allows us to improve the state-of-the-art of optimal decision tree induction

    Comparison of different algorithms for exploting the hidden trends in data sources

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2003Includes bibliographical references (leaves: 92-97)Text in English; Abstract: Turkish and English97 leavesThe growth of large-scale transactional databases, time-series databases and other kinds of databases has been giving rise to the development of several efficient algorithms that cope with the computationally expensive task of association rule mining.In this study, different algorithms, Apriori, FP-tree and CHARM, for exploiting the hidden trends such as frequent itemsets, frequent patterns, closed frequent itemsets respectively, were discussed and their performances were evaluated. The perfomances of the algorithms were measured at different support levels, and the algorithms were tested on different data sets (on both synthetic and real data sets). The algorihms were compared according to their, data preparation performances, mining performance, run time performances and knowledge extraction capabilities.The Apriori algorithm is the most prevalent algorithm of association rule mining which makes multiple passes over the database aiming at finding the set of frequent itemsets for each level. The FP-Tree algorithm is a scalable algorithm which finds the crucial information as regards the complete set of prefix paths, conditional pattern bases and frequent patterns by using a compact FP-Tree based mining method. The CHARM is a novel algorithm which brings remarkable improvements over existing association rule mining algorithms by proving the fact that mining the set of closed frequent itemsets is adequate instead of mining the set of all frequent itemsets.Related to our experimental results, we conclude that the Apriori algorithm demonstrates a good performance on sparse data sets. The Fp-tree algorithm extracts less association in comparison to Apriori, however it is completelty a feasable solution that facilitates mining dense data sets at low support levels. On the other hand, the CHARM algorithm is an appropriate algorithm for mining closed frequent itemsets (a substantial portion of frequent itemsets) on both sparse and dense data sets even at low levels of support

    Dynamic causal mining

    Get PDF
    Causality plays a central role in human reasoning, in particular, in common human decision-making, by providing a basis for strategy selection. The main aim of the research reported in this thesis is to develop a new way to identify dynamic causal relationships between attributes of a system. The first part of the thesis introduces the development of a new data mining algorithm, called Dynamic Causal Mining (DCM), which extracts rules from data sets based on simultaneous time stamps. The rules derived can be combined into policies, which can simulate the future behaviour of systems. New rules can be added to the policies depending on the degree of accuracy. In addition, facilities to process categorical or numerical attributes directly and approaches to prune the rule set efficiently are implemented in the DCM algorithm. The second part of the thesis discusses how to improve the DCM algorithm in order to identify delay and feedback relationships. Fuzzy logic is applied to manage the rules and policies flexibly and accurately during the learning process and help the algorithm to find feasible solutions. The third part of the thesis describes the application of the suggested algorithm to a problem in the game-theoretic domain. This part concludes with the suggestion to use concept lattices as a method to represent and structure the discovered knowledge

    Size of random Galois lattices and number of frequent itemsets

    Get PDF
    19 pagesWe compute the mean and the variance of the size of the Galois lattice built from a random matrix with i.i.d. Bernoulli(p) entries. Then, obseving that closed frequent itemsets are in bijection with winning coalitions, we compute the mean and the variance of the number of closed frequent itemsets. This can be of interest for mining association rules

    Spatial regression in large datasets: problem set solution

    Get PDF
    In this dissertation we investigate a possible attempt to combine the Data Mining methods and traditional Spatial Autoregressive models, in the context of large spatial datasets. We start to considere the numerical difficulties to handle massive datasets by the usual approach based on Maximum Likelihood estimation for spatial models and Spatial Two-Stage Least Squares. So, we conduct an experiment by Monte Carlo simulations to compare the accuracy and computational complexity for decomposition and approximation techniques to solve the problem of computing the Jacobian in spatial models, for various regular lattice structures. In particular, we consider one of the most common spatial econometric models: spatial lag (or SAR, spatial autoregressive model). Also, we provide new evidences in the literature, by examining the double effect on computational complexity of these methods: the influence of "size effect" and "sparsity effect". To overcome this computational problem, we propose a data mining methodology as CART (Classification and Regression Tree) that explicitly considers the phenomenon of spatial autocorrelation on pseudo-residuals, in order to remove this effect and to improve the accuracy, with significant saving in computational complexity in wide range of spatial datasets: realand simulated data

    Mining Closed Itemsets for Coherent Rules: An Inference Analysis Approach

    Get PDF
    Past observations have shown that a frequent item set mining algorithm are alleged to mine the closed ones because the finish offers a compact and a whole progress set and higher potency. Anyhow, the most recent closed item set mining algorithms works with candidate maintenance combined with check paradigm that is dear in runtime likewise as area usage when support threshold is a smaller amount or the item sets gets long. Here, we show, PEPP with inference analysis that could be a capable approach used for mining closed sequences for coherent rules while not candidate. It implements a unique sequence closure checking format with inference analysis that based mostly on Sequence Graph protruding by an approach labeled Parallel Edge projection and pruning in brief will refer as PEPP. We describe a novel inference analysis approach to prune patterns that tends to derive coherent rules. A whole observation having sparse and dense real-life information sets proved that PEPP with inference analysis performs larger compared to older algorithms because it takes low memory and is quicker than any algorithms those cited in literature frequently

    Fast Generation of Best Interval Patterns for Nonmonotonic Constraints

    Get PDF
    International audienceIn pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are neither monotonic nor anti-monotonic, which makes it difficult to generate patterns satisfying these constraints.In this paper we introduce the notion of "generalized monotonicity" and Sofia algorithm that allow generating best patterns in polynomial time for some nonmonotonic constraints modulo constraint computation and pattern extension operations. In particular, this algorithm is polynomial for data on itemsets and interval tuples. In this paper we consider stability and delta-measure which are nonmonotonic constraints and apply them to interval tuple datasets. In the experiments, we compute best interval tuple patterns w.r.t. these measures and show the advantage of our approach over postfiltering approaches
    • …
    corecore