Search CORE

47 research outputs found

An experiment with association rules and classification: post-bagging and conviction

Author: A. Jorge
B. Liu
B. Liu
D. Meretakis
I. Kononenko
I.H. Witten
K. Ali
L. Breiman
M.J. Zaki
P. Domingos
R. Ihaka
R.J. Bayardo
T. Hastie
T.K. Ho
U.M. Fayyad
V. Jovanoski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

In this paper we study a new technique we call post-bagging, which consists in resampling parts of a classification model rather then the data. We do this with a particular kind of model: large sets of classification association rules, and in combination with ordinary best rule and weighted voting approaches. We empirically evaluate the effects of the technique in terms of classification accuracy. We also discuss the predictive power of different metrics used for association rule mining, such as confidence, lift, conviction and X². We conclude that, for the described experimental conditions, post-bagging improves classification results and that the best metric is conviction.Programa de Financiamento Plurianual de Unidades de I & D.Comunidade Europeia (CE). Fundo Europeu de Desenvolvimento Regional (FEDER).Fundação para a Ciência e a Tecnologia (FCT) - POSI/SRI/39630/2001/Class Project

CiteSeerX

Universidade do Minho: RepositoriUM

Crossref

On the complexity of strongly connected components in directed hypergraphs

Author: A. Elmasry
A.V. Aho
C.C. Özturan
D. Pretolani
D. Pretolani
D.G. Kirkpatrick
D.M. Yellin
D.M. Yellin
G. Ausiello
G. Ausiello
G. Ausiello
G. Ausiello
G. Ausiello
G. Ausiello
G. Ausiello
G. Gallo
G. Gallo
G. Gallo
H.N. Gabow
H.T. Kung
J. Cheriyan
L.R. Nielsen
M. Thakur
P. Godfrey
P. Pritchard
P. Pritchard
P. Pritchard
P. Pritchard
R. Tarjan
R.D. Katz
R.J. Bayardo
S. Gaubert
S. Nguyen
S. Nguyen
T.H. Cormen
X. Allamigeon
X. Allamigeon
X. Liu
Xavier Allamigeon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/02/2013
Field of study

We study the complexity of some algorithmic problems on directed hypergraphs and their strongly connected components (SCCs). The main contribution is an almost linear time algorithm computing the terminal strongly connected components (i.e. SCCs which do not reach any components but themselves). "Almost linear" here means that the complexity of the algorithm is linear in the size of the hypergraph up to a factor alpha(n), where alpha is the inverse of Ackermann function, and n is the number of vertices. Our motivation to study this problem arises from a recent application of directed hypergraphs to computational tropical geometry. We also discuss the problem of computing all SCCs. We establish a superlinear lower bound on the size of the transitive reduction of the reachability relation in directed hypergraphs, showing that it is combinatorially more complex than in directed graphs. Besides, we prove a linear time reduction from the well-studied problem of finding all minimal sets among a given family to the problem of computing the SCCs. Only subquadratic time algorithms are known for the former problem. These results strongly suggest that the problem of computing the SCCs is harder in directed hypergraphs than in directed graphs.Comment: v1: 32 pages, 7 figures; v2: revised version, 34 pages, 7 figure

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

Mining and visualizing recommendation spaces for elliptic PDEs with continuous attributes

Author: AGRAWAL R.
BAYARDO R.J.
Calvin J. Ribbens
FUKUDA T.
HAN J.
HIDBER C.
HOUSTIS E.N.
IMIELINSKI T.
Naren Ramakrishnan
PARK J.S.
RAMAKRISHNAN N.
RICE J.
SAAD Y.
SARAWAGI S.
SCHMID W.
STEINACHER S.
SUBRAMANIAN D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A SAT-Based Decision Procedure for the Boolean Combination of Difference Constraints

Author: A. Armando
A. Gerevini
C.M. Li
D. Berre Le
E. Giunchiglia
G. Audemard
O. Strichman
P. Prosser
R. Dechter
R.J. Bayardo Jr.
T.H. Cormen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Archivio istituzionale della ricerca - Università di Genova

Using CSP Look-Back Techniques to Solve Exceptionally Hard SAT Instances

Author: R.J. Bayardo
Publication venue
Publication date: 01/01/1996
Field of study

While CNF propositional satisfiability (SAT) is a sub-class of the more general constraint satisfaction problem (CSP), conventional wisdom has it that some well-known CSP look-back techniques -- including backjumping and learning -- are of little use for SAT. We enhance the Tableau SAT algorithm of Crawford and Auton with look-back techniques and evaluate its performance on problems specifically designed to challenge it. The Random 3-SAT problem space has commonly been used to benchmark SAT algorithms because consistently difficult instances can be found near a region known as the phase transition. We modify Random 3-SAT in two ways which make instances even harder. First, we evaluate problems with structural regularities and find that CSP look-back techniques offer little advantage. Second, we evaluate problems in which a hard unsatisfiable instance of medium size is embedded in a larger instance, and we find the look-back enhancements to be indispensable. Without them, most instances are "exceptionally hard" -- orders of magnitude harder than typical Random 3-SAT instances with the same surface characteristics

CiteSeerX

DisClose: Discovering colossal closed itemsets via a memory efficient compact row-tree

Author: H. Liu
J. Besson
J. Han
R.J. Bayardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

A recent focus in itemset mining has been the discovery of frequent itemsets from high-dimensional datasets. With exponentially increasing running time as average row length increases, mining such datasets renders most conventional algorithms impractical. Unfortunately, large cardinality itemsets are likely to be more informative than small cardinality itemsets in this type of dataset. This paper proposes an approach, termed DisClose, to extract large cardinality (colossal) closed itemsets from high-dimensional datasets. The approach relies on a Compact Row-Tree data structure to represent itemsets during the search process. Large cardinality itemsets are enumerated first followed by smaller ones. In addition, we utilize a minimum cardinality threshold to further reduce the search space. Experimental results show that DisClose can achieve extraction of colossal closed itemsets in the discovered datasets, even for low support thresholds. The algorithm immediately discovers closed itemsets without needing to check if each new closed itemset has previously been found

Crossref

The International Islamic University Malaysia Repository