17 research outputs found
Quantitative Redundancy in Partial Implications
We survey the different properties of an intuitive notion of redundancy, as a
function of the precise semantics given to the notion of partial implication.
The final version of this survey will appear in the Proceedings of the Int.
Conf. Formal Concept Analysis, 2015.Comment: Int. Conf. Formal Concept Analysis, 201
Objective novelty of association rules: measuring the confidence boost
On sait bien que la confiance des régles d’association n’est pas vraiment satisfaisant comme mésure d’interêt. Nous proposons, au lieu de la substituer par des autres mésures (soit, en l’employant de façon conjointe a des autres mésures), évaluer la nouveauté de chaque régle par comparaison de sa confiance par rapport á des régles plus fortes qu’on trouve au même ensemble de données. C’est á dire, on considère un seuil “relative” de confiance au lieu du seuil absolute habituel. Cette idée se précise avec la magnitude du “confidence boost”, mésurant l’increment rélative de confiance prés des régles plus fortes. Nous prouvons que nôtre proposte peut remplacer la “confidence width” et le
blockage de régles employés a des publications précedentes.Postprint (author’s final draft
Relative Entailment Among Probabilistic Implications
We study a natural variant of the implicational fragment of propositional
logic. Its formulas are pairs of conjunctions of positive literals, related
together by an implicational-like connective; the semantics of this sort of
implication is defined in terms of a threshold on a conditional probability of
the consequent, given the antecedent: we are dealing with what the data
analysis community calls confidence of partial implications or association
rules. Existing studies of redundancy among these partial implications have
characterized so far only entailment from one premise and entailment from two
premises, both in the stand-alone case and in the case of presence of
additional classical implications (this is what we call "relative entailment").
By exploiting a previously noted alternative view of the entailment in terms of
linear programming duality, we characterize exactly the cases of entailment
from arbitrary numbers of premises, again both in the stand-alone case and in
the case of presence of additional classical implications. As a result, we
obtain decision algorithms of better complexity; additionally, for each
potential case of entailment, we identify a critical confidence threshold and
show that it is, actually, intrinsic to each set of premises and antecedent of
the conclusion
A Three-phased Online Association Rule Mining Approach for Diverse Mining Requests
In the past, most incremental mining and online mining algorithms considered finding the set of association rules or patterns consistent with the entire set of data inserted so far. Users can not easily obtain the results from their only interested portion of data. For providing ad-hoc, query-driven and online mining supports, we first propose a relation called multidimensional pattern relation to structurally and systematically store the context information and the mining information for later analysis. Each tuple in the relation comes from an inserted dataset in the database. This concept is similar to the construction of a data warehouse for OLAP. However, unlike the summarized information of fact attributes in a data warehouse, the mined patterns in the multidimensional pattern relation can not be directly aggregated to satisfy users’ mining requests. We then develop an online mining approach called Three-phased Online Association Rule Mining (TOARM) based on the proposed multidimensional pattern relation to support online generation of association rules under multidimensional considerations. Experiments for both homogeneous and heterogeneous datasets are made, with results showing the effectiveness of the proposed approach
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in
the field of Data Mining. An association rule is a form of partial implication
between two sets of binary variables. In the most common approach, association
rules are parameterized by a lower bound on their confidence, which is the
empirical conditional probability of their consequent given the antecedent,
and/or by some other parameter bounds such as "support" or deviation from
independence. We study here notions of redundancy among association rules from
a fundamental perspective. We see each transaction in a dataset as an
interpretation (or model) in the propositional logic sense, and consider
existing notions of redundancy, that is, of logical entailment, among
association rules, of the form "any dataset in which this first rule holds must
obey also that second rule, therefore the second is redundant". We discuss
several existing alternative definitions of redundancy between association
rules and provide new characterizations and relationships among them. We show
that the main alternatives we discuss correspond actually to just two variants,
which differ in the treatment of full-confidence implications. For each of
these two notions of redundancy, we provide a sound and complete deduction
calculus, and we show how to construct complete bases (that is,
axiomatizations) of absolutely minimum size in terms of the number of rules. We
explore finally an approach to redundancy with respect to several association
rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape
Closed-set-based discovery of representative association rules revisited
The output of an association rule miner is often huge in practice. This is why several concise lossless representations have been proposed, such as the “essential” or “representative” rules. We revisit the algorithm given by Kryszkiewicz (Int. Symp. Intelligent Data Analysis 2001, Springer-Verlag LNCS 2189, 350–359) for mining representative rules. We show that its output is sometimes incomplete, due to an oversight in its mathematical validation, and we propose an alternative complete generator that works within only slightly larger running times.Postprint (author’s final draft