47 research outputs found
An experiment with association rules and classification: post-bagging and conviction
In this paper we study a new technique we call post-bagging, which consists in
resampling parts of a classification model rather then the data. We do
this with a particular kind of model: large sets of classification association rules, and
in combination with ordinary best rule and weighted voting approaches.
We empirically evaluate the effects of the technique in terms of classification accuracy.
We also discuss the predictive power of different metrics used for association rule mining, such as confidence, lift, conviction and X². We conclude that, for the described experimental conditions, post-bagging improves classification results and that the best metric is conviction.Programa de Financiamento Plurianual de Unidades de I & D.Comunidade Europeia (CE). Fundo Europeu de Desenvolvimento Regional (FEDER).Fundação para a Ciência e a Tecnologia (FCT) - POSI/SRI/39630/2001/Class Project
On the complexity of strongly connected components in directed hypergraphs
We study the complexity of some algorithmic problems on directed hypergraphs
and their strongly connected components (SCCs). The main contribution is an
almost linear time algorithm computing the terminal strongly connected
components (i.e. SCCs which do not reach any components but themselves).
"Almost linear" here means that the complexity of the algorithm is linear in
the size of the hypergraph up to a factor alpha(n), where alpha is the inverse
of Ackermann function, and n is the number of vertices. Our motivation to study
this problem arises from a recent application of directed hypergraphs to
computational tropical geometry.
We also discuss the problem of computing all SCCs. We establish a superlinear
lower bound on the size of the transitive reduction of the reachability
relation in directed hypergraphs, showing that it is combinatorially more
complex than in directed graphs. Besides, we prove a linear time reduction from
the well-studied problem of finding all minimal sets among a given family to
the problem of computing the SCCs. Only subquadratic time algorithms are known
for the former problem. These results strongly suggest that the problem of
computing the SCCs is harder in directed hypergraphs than in directed graphs.Comment: v1: 32 pages, 7 figures; v2: revised version, 34 pages, 7 figure
Using CSP Look-Back Techniques to Solve Exceptionally Hard SAT Instances
While CNF propositional satisfiability (SAT) is a sub-class of the more general constraint satisfaction problem (CSP), conventional wisdom has it that some well-known CSP look-back techniques -- including backjumping and learning -- are of little use for SAT. We enhance the Tableau SAT algorithm of Crawford and Auton with look-back techniques and evaluate its performance on problems specifically designed to challenge it. The Random 3-SAT problem space has commonly been used to benchmark SAT algorithms because consistently difficult instances can be found near a region known as the phase transition. We modify Random 3-SAT in two ways which make instances even harder. First, we evaluate problems with structural regularities and find that CSP look-back techniques offer little advantage. Second, we evaluate problems in which a hard unsatisfiable instance of medium size is embedded in a larger instance, and we find the look-back enhancements to be indispensable. Without them, most instances are "exceptionally hard" -- orders of magnitude harder than typical Random 3-SAT instances with the same surface characteristics
DisClose: Discovering colossal closed itemsets via a memory efficient compact row-tree
A recent focus in itemset mining has been the discovery of frequent itemsets from high-dimensional datasets. With exponentially increasing running time as average row length increases, mining such datasets renders most conventional algorithms impractical. Unfortunately, large cardinality itemsets are likely to be more informative than small cardinality itemsets in this type of dataset. This paper proposes an approach, termed DisClose, to extract large cardinality (colossal) closed itemsets from high-dimensional datasets. The approach relies on a Compact Row-Tree data structure to represent itemsets during the search process. Large cardinality itemsets are enumerated first followed by smaller ones. In addition, we utilize a minimum cardinality threshold to further reduce the search space. Experimental results show that DisClose can achieve extraction of colossal closed itemsets in the discovered datasets, even for low support thresholds. The algorithm immediately discovers closed itemsets without needing to check if each new closed itemset has previously been found
(In)Effectiveness of Look-Ahead Techniques in a Modern SAT Solver
none3norestrictedGiunchiglia, Enrico; Maratea, Marco; Tacchella, ArmandoGiunchiglia, Enrico; Maratea, Marco; Tacchella, Armand