Search CORE

5 research outputs found

Mining bases for association rules using closed sets

Author: Bastide Yves
Lakhal Lotfi
Pasquier Nicolas
Taouil Rafik
Publication venue: HAL CCSD
Publication date: 28/02/2000
Field of study

International audienceAssociation rules are conditional implications between requent itemsets. The problem of the usefulness and the elevance of the set of discovered association rules is related to the huge number of rules extracted and the presence of many redundancies among these rules for many datasets. We address this important problem using the Galois connection framework and we show that we can generate bases or association rules using the frequent closed itemsets extracted by the Close or the A-Close algorithms

HAL Clermont Université

Hal-Diderot

Employing Inductive Databases in Concrete Applications

Author: Careggio D.
Esposito Roberto
Lanzi P.
Matera M.
Meo Rosa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Institutional Research Information System University of Turin

Horn axiomatizations for sequential data

Author: Balcázar José L.
Garriga Gemma C.
Publication venue: Elsevier Ltd.
Publication date: 01/03/2007
Field of study

AbstractWe propose a notion of deterministic association rules for ordered data. We prove that our proposed rules can be formally justified by a purely logical characterization, namely, a natural notion of empirical Horn approximation for ordered data which involves background Horn conditions; these ensure the consistency of the propositional theory obtained with the ordered context. The whole framework resorts to concept lattice models from Formal Concept Analysis, but adapted to ordered contexts. We also discuss a general method to mine these rules that can be easily incorporated into any algorithm for mining closed sequences, of which there are already some in the literature

Elsevier - Publisher Connector

Discovering Interesting Patterns and Associations in Data Streams

Author: Jiang Nan
Publication venue
Publication date: 01/01/2009
Field of study

A data stream is a sequence of items that arrive in a timely order. Different from data in traditional static databases, data streams are continuous, unbounded, usually come with high speed, and have a data value distribution that often changes with time (Guha, 2001). As more applications such as web transactions, telephone records, and network flows generate a large number of data streams every day, efficient knowledge discovery of data streams is an active and growing research area in data mining with broad applications. Traditional data mining algorithms are developed to work on a complete static dataset and, thus, cannot be applied directly in data stream applications.One area of data mining research is to mine association relationship in a data set. Most of association mining techniques for data streams can be categorized into two types: those developed based on frequent patterns and those developed based on closed patterns. Due to the number of frequent patterns are often huge and redundant, non-informative patterns are contained in frequent patterns. An alternative way is to develop the association mining approaches for data streaming applications based on closed patterns, which generally represent a small subset of all frequent patterns, but provide complete and condensed information. In these researches, the closed pattern mining is the prerequisite condition for non-redundant and informative association mining.In this dissertation, a sliding window technique for dynamic mining of closed patterns in data streams is proposed, and an approach of mining non-redundant and informative associations based on the discovered closed patterns is developed. The closed pattern and relevant association mining techniques are selected research area in this dissertation. First, the closed patterns for a given collection of data are currently the most compact data knowledge that can provide complete support information for all data patterns.Compared with other techniques, the proposed closed pattern mining technique has potential to largely decrease the number of subsequent combinatorial calculations performed on the data patterns. Second, the memory requirement to store the closed patterns and relevant associations is generally lower than the corresponding frequent patterns and associations. In some data streaming applications, memory usage is an important measurement, because in these applications memory usage is the bottleneck for knowledge discovery. Third, the associations generated for data streams are the knowledge used to identify the relations within the data. The discovered relations can find their wide applications in many data streaming environments.Different from the closed pattern mining techniques on traditional databases, which require multiple scans of the entire database, the proposed technique determines the closed patterns with a single scan. It is an incremental mining process; as the sliding window advances, new data transactions enter and old data transactions exit the window. But instead of regenerating closed patterns from the entire window, the proposed technique updates the old set of closed patterns whenever a new transaction arrives and/or an old transaction leaves the sliding window to obtain the current set of closed patterns. This incremental feature allows the user to get the most recent updated closed patterns without rescanning the entire updated database, which saves not only the computation time, but more importantly, the I/O operating time to load and write data from database to memory. Third, the proposed sliding window technique can handle both the insertion and deletion operations independently, which allows the user to adjust the sliding window size in different application environments. Furthermore, the proposed interesting patterns and association mining framework can handle different users' requests at the same time at their specified support and confidence thresholds, and interested input and output patterns.The research includes both theoretical proofs of correctness for the proposed algorithms and simulation experiments to compare the proposed techniques with those existing in the literature using synthetic and real datasets. The utility of the proposed technique is applied to sensor network databases of a traffic management and an environmental monitoring site for missing data estimation purpose

SHAREOK repository