3,396 research outputs found
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in
the field of Data Mining. An association rule is a form of partial implication
between two sets of binary variables. In the most common approach, association
rules are parameterized by a lower bound on their confidence, which is the
empirical conditional probability of their consequent given the antecedent,
and/or by some other parameter bounds such as "support" or deviation from
independence. We study here notions of redundancy among association rules from
a fundamental perspective. We see each transaction in a dataset as an
interpretation (or model) in the propositional logic sense, and consider
existing notions of redundancy, that is, of logical entailment, among
association rules, of the form "any dataset in which this first rule holds must
obey also that second rule, therefore the second is redundant". We discuss
several existing alternative definitions of redundancy between association
rules and provide new characterizations and relationships among them. We show
that the main alternatives we discuss correspond actually to just two variants,
which differ in the treatment of full-confidence implications. For each of
these two notions of redundancy, we provide a sound and complete deduction
calculus, and we show how to construct complete bases (that is,
axiomatizations) of absolutely minimum size in terms of the number of rules. We
explore finally an approach to redundancy with respect to several association
rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape
Quantitative Redundancy in Partial Implications
We survey the different properties of an intuitive notion of redundancy, as a
function of the precise semantics given to the notion of partial implication.
The final version of this survey will appear in the Proceedings of the Int.
Conf. Formal Concept Analysis, 2015.Comment: Int. Conf. Formal Concept Analysis, 201
The Bases of Association Rules of High Confidence
We develop a new approach for distributed computing of the association rules
of high confidence in a binary table. It is derived from the D-basis algorithm
in K. Adaricheva and J.B. Nation (TCS 2017), which is performed on multiple
sub-tables of a table given by removing several rows at a time. The set of
rules is then aggregated using the same approach as the D-basis is retrieved
from a larger set of implications. This allows to obtain a basis of association
rules of high confidence, which can be used for ranking all attributes of the
table with respect to a given fixed attribute using the relevance parameter
introduced in K. Adaricheva et al. (Proceedings of ICFCA-2015). This paper
focuses on the technical implementation of the new algorithm. Some testing
results are performed on transaction data and medical data.Comment: Presented at DTMN, Sydney, Australia, July 28, 201
Relative Entailment Among Probabilistic Implications
We study a natural variant of the implicational fragment of propositional
logic. Its formulas are pairs of conjunctions of positive literals, related
together by an implicational-like connective; the semantics of this sort of
implication is defined in terms of a threshold on a conditional probability of
the consequent, given the antecedent: we are dealing with what the data
analysis community calls confidence of partial implications or association
rules. Existing studies of redundancy among these partial implications have
characterized so far only entailment from one premise and entailment from two
premises, both in the stand-alone case and in the case of presence of
additional classical implications (this is what we call "relative entailment").
By exploiting a previously noted alternative view of the entailment in terms of
linear programming duality, we characterize exactly the cases of entailment
from arbitrary numbers of premises, again both in the stand-alone case and in
the case of presence of additional classical implications. As a result, we
obtain decision algorithms of better complexity; additionally, for each
potential case of entailment, we identify a critical confidence threshold and
show that it is, actually, intrinsic to each set of premises and antecedent of
the conclusion
Objective novelty of association rules: measuring the confidence boost
On sait bien que la confiance des régles d’association n’est pas vraiment satisfaisant comme mésure d’interêt. Nous proposons, au lieu de la substituer par des autres mésures (soit, en l’employant de façon conjointe a des autres mésures), évaluer la nouveauté de chaque régle par comparaison de sa confiance par rapport á des régles plus fortes qu’on trouve au même ensemble de données. C’est á dire, on considère un seuil “relative” de confiance au lieu du seuil absolute habituel. Cette idée se précise avec la magnitude du “confidence boost”, mésurant l’increment rélative de confiance prés des régles plus fortes. Nous prouvons que nôtre proposte peut remplacer la “confidence width” et le
blockage de régles employés a des publications précedentes.Postprint (author’s final draft
Closed-set-based discovery of representative association rules
The output of an association rule miner is often huge in practice. This is why several concise lossless representations have been proposed, such as the “essential” or “representative” rules. A previously known algorithm for mining representative rules relies on an incorrect mathematical claim, and can be seen to miss part of its intended output; in previous work, two of the authors of the present paper have offered a complete but, often, somewhat slower alternative. Here, we extend this alternative to the case of closure-based redundancy. The empirical validation shows that, in this way, we can improve on the original time efficiency, without sacrificing completeness.Peer ReviewedPostprint (author's final draft
- …