149 research outputs found

    Efficiently mining long patterns from databases

    Full text link

    Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules

    Full text link
    Association rules are among the most widely employed data analysis methods in the field of Data Mining. An association rule is a form of partial implication between two sets of binary variables. In the most common approach, association rules are parameterized by a lower bound on their confidence, which is the empirical conditional probability of their consequent given the antecedent, and/or by some other parameter bounds such as "support" or deviation from independence. We study here notions of redundancy among association rules from a fundamental perspective. We see each transaction in a dataset as an interpretation (or model) in the propositional logic sense, and consider existing notions of redundancy, that is, of logical entailment, among association rules, of the form "any dataset in which this first rule holds must obey also that second rule, therefore the second is redundant". We discuss several existing alternative definitions of redundancy between association rules and provide new characterizations and relationships among them. We show that the main alternatives we discuss correspond actually to just two variants, which differ in the treatment of full-confidence implications. For each of these two notions of redundancy, we provide a sound and complete deduction calculus, and we show how to construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of the number of rules. We explore finally an approach to redundancy with respect to several association rules, and fully characterize its simplest case of two partial premises.Comment: LMCS accepted pape

    Beyond Hypergraph Dualization

    Get PDF
    International audienceThis problem concerns hypergraph dualization and generalization to poset dualization. A hypergraph H = (V, E) consists of a finite collection E of sets over a finite set V , i.e. E ⊆ P(V) (the powerset of V). The elements of E are called hyperedges, or simply edges. A hypergraph is said simple if none of its edges is contained within another. A transversal (or hitting set) of H is a set T ⊆ V that intersects every edge of E. A transversal is minimal if it does not contain any other transversal as a subset. The set of all minimal transversal of H is denoted by T r(H). The hypergraph (V, T r(H)) is called the transversal hypergraph of H. Given a simple hypergraph H, the hypergraph dualization problem (Trans-Enum for short) concerns the enumeration without repetitions of T r(H). The Trans-Enum problem can also be formulated as a dualization problem in posets. Let (P, ≤) be a poset (i.e. ≤ is a reflexive, antisymmetric, and transitive relation on the set P). For A ⊆ P , ↓ A (resp. ↑ A) is the downward (resp. upward) closure of A under the relation ≤ (i.e. ↓ A is an ideal and ↑ A a filter of (P, ≤)). Two antichains (B + , B −) of P are said to be dual if ↓ B + ∪ ↑ B − = P and ↓ B + ∩ ↑ B − = ∅. Given an implicit description of a poset P and an antichain B + (resp. B −) of P , the poset dualization problem (Dual-Enum for short) enumerates the set B − (resp. B +), denoted by Dual(B +) = B − (resp. Dual(B −) = B +). Notice that the function dual is self-dual or idempotent, i.e. Dual(Dual(B)) = B

    Mobility Data Science (Dagstuhl Seminar 22021)

    Get PDF
    This report documents the program and the outcomes of Dagstuhl Seminar 22021 "Mobility Data Science". This seminar was held January 9-14, 2022, including 47 participants from industry and academia. The goal of this Dagstuhl Seminar was to create a new research community of mobility data science in which the whole is greater than the sum of its parts by bringing together established leaders as well as promising young researchers from all fields related to mobility data science. Specifically, this report summarizes the main results of the seminar by (1) defining Mobility Data Science as a research domain, (2) by sketching its agenda in the coming years, and by (3) building a mobility data science community. (1) Mobility data science is defined as spatiotemporal data that additionally captures the behavior of moving entities (human, vehicle, animal, etc.). To understand, explain, and predict behavior, we note that a strong collaboration with research in behavioral and social sciences is needed. (2) Future research directions for mobility data science described in this report include a) mobility data acquisition and privacy, b) mobility data management and analysis, and c) applications of mobility data science. (3) We identify opportunities towards building a mobility data science community, towards collaborations between academic and industry, and towards a mobility data science curriculum

    Geometric problems in machine learning

    No full text
    corecore