287 research outputs found
An efficient closed frequent itemset miner for the MOA stream mining system
Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.Postprint (published version
CICLAD: A Fast and Memory-efficient Closed Itemset Miner for Streams
Mining association rules from data streams is a challenging task due to the
(typically) limited resources available vs. the large size of the result.
Frequent closed itemsets (FCI) enable an efficient first step, yet current FCI
stream miners are not optimal on resource consumption, e.g. they store a large
number of extra itemsets at an additional cost. In a search for a better
storage-efficiency trade-off, we designed Ciclad,an intersection-based
sliding-window FCI miner. Leveraging in-depth insights into FCI evolution, it
combines minimal storage with quick access. Experimental results indicate
Ciclad's memory imprint is much lower and its performances globally better than
competitor methods.Comment: KDD2
Requirements and Use Cases ; Report I on the sub-project Smart Content Enrichment
In this technical report, we present the results of the first milestone phase
of the Corporate Smart Content sub-project "Smart Content Enrichment". We
present analyses of the state of the art in the fields concerning the three
working packages defined in the sub-project, which are aspect-oriented
ontology development, complex entity recognition, and semantic event pattern
mining. We compare the research approaches related to our three research
subjects and outline briefly our future work plan
- …