47 research outputs found

    An experiment with association rules and classification: post-bagging and conviction

    Get PDF
    In this paper we study a new technique we call post-bagging, which consists in resampling parts of a classification model rather then the data. We do this with a particular kind of model: large sets of classification association rules, and in combination with ordinary best rule and weighted voting approaches. We empirically evaluate the effects of the technique in terms of classification accuracy. We also discuss the predictive power of different metrics used for association rule mining, such as confidence, lift, conviction and X². We conclude that, for the described experimental conditions, post-bagging improves classification results and that the best metric is conviction.Programa de Financiamento Plurianual de Unidades de I & D.Comunidade Europeia (CE). Fundo Europeu de Desenvolvimento Regional (FEDER).Fundação para a Ciência e a Tecnologia (FCT) - POSI/SRI/39630/2001/Class Project

    On the complexity of strongly connected components in directed hypergraphs

    Full text link
    We study the complexity of some algorithmic problems on directed hypergraphs and their strongly connected components (SCCs). The main contribution is an almost linear time algorithm computing the terminal strongly connected components (i.e. SCCs which do not reach any components but themselves). "Almost linear" here means that the complexity of the algorithm is linear in the size of the hypergraph up to a factor alpha(n), where alpha is the inverse of Ackermann function, and n is the number of vertices. Our motivation to study this problem arises from a recent application of directed hypergraphs to computational tropical geometry. We also discuss the problem of computing all SCCs. We establish a superlinear lower bound on the size of the transitive reduction of the reachability relation in directed hypergraphs, showing that it is combinatorially more complex than in directed graphs. Besides, we prove a linear time reduction from the well-studied problem of finding all minimal sets among a given family to the problem of computing the SCCs. Only subquadratic time algorithms are known for the former problem. These results strongly suggest that the problem of computing the SCCs is harder in directed hypergraphs than in directed graphs.Comment: v1: 32 pages, 7 figures; v2: revised version, 34 pages, 7 figure

    Using CSP Look-Back Techniques to Solve Exceptionally Hard SAT Instances

    No full text
    While CNF propositional satisfiability (SAT) is a sub-class of the more general constraint satisfaction problem (CSP), conventional wisdom has it that some well-known CSP look-back techniques -- including backjumping and learning -- are of little use for SAT. We enhance the Tableau SAT algorithm of Crawford and Auton with look-back techniques and evaluate its performance on problems specifically designed to challenge it. The Random 3-SAT problem space has commonly been used to benchmark SAT algorithms because consistently difficult instances can be found near a region known as the phase transition. We modify Random 3-SAT in two ways which make instances even harder. First, we evaluate problems with structural regularities and find that CSP look-back techniques offer little advantage. Second, we evaluate problems in which a hard unsatisfiable instance of medium size is embedded in a larger instance, and we find the look-back enhancements to be indispensable. Without them, most instances are "exceptionally hard" -- orders of magnitude harder than typical Random 3-SAT instances with the same surface characteristics

    DisClose: Discovering colossal closed itemsets via a memory efficient compact row-tree

    No full text
    A recent focus in itemset mining has been the discovery of frequent itemsets from high-dimensional datasets. With exponentially increasing running time as average row length increases, mining such datasets renders most conventional algorithms impractical. Unfortunately, large cardinality itemsets are likely to be more informative than small cardinality itemsets in this type of dataset. This paper proposes an approach, termed DisClose, to extract large cardinality (colossal) closed itemsets from high-dimensional datasets. The approach relies on a Compact Row-Tree data structure to represent itemsets during the search process. Large cardinality itemsets are enumerated first followed by smaller ones. In addition, we utilize a minimum cardinality threshold to further reduce the search space. Experimental results show that DisClose can achieve extraction of colossal closed itemsets in the discovered datasets, even for low support thresholds. The algorithm immediately discovers closed itemsets without needing to check if each new closed itemset has previously been found

    DisClose: Discovering Colossal Closed Itemsets via a Memory Efficient Compact Row-Tree

    No full text

    A Decremental Approach for Mining Frequent Itemsets from Uncertain Data

    No full text

    (In)Effectiveness of Look-Ahead Techniques in a Modern SAT Solver

    No full text
    none3norestrictedGiunchiglia, Enrico; Maratea, Marco; Tacchella, ArmandoGiunchiglia, Enrico; Maratea, Marco; Tacchella, Armand
    corecore