1,428 research outputs found
Flexible constrained sampling with guarantees for pattern mining
Pattern sampling has been proposed as a potential solution to the infamous
pattern explosion. Instead of enumerating all patterns that satisfy the
constraints, individual patterns are sampled proportional to a given quality
measure. Several sampling algorithms have been proposed, but each of them has
its limitations when it comes to 1) flexibility in terms of quality measures
and constraints that can be used, and/or 2) guarantees with respect to sampling
accuracy. We therefore present Flexics, the first flexible pattern sampler that
supports a broad class of quality measures and constraints, while providing
strong guarantees regarding sampling accuracy. To achieve this, we leverage the
perspective on pattern mining as a constraint satisfaction problem and build
upon the latest advances in sampling solutions in SAT as well as existing
pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
a variety of pattern languages, which allows us to introduce and tackle the
novel task of sampling sets of patterns. We introduce and empirically evaluate
two variants of Flexics: 1) a generic variant that addresses the well-known
itemset sampling task and the novel pattern set sampling task as well as a wide
range of expressive constraints within these tasks, and 2) a specialized
variant that exploits existing frequent itemset techniques to achieve
substantial speed-ups. Experiments show that Flexics is both accurate and
efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal
(ECML/PKDD 2017 journal track
A review of associative classification mining
Associative classification mining is a promising approach in data mining that utilizes the
association rule discovery techniques to construct classification systems, also known as
associative classifiers. In the last few years, a number of associative classification algorithms
have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms
employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule
evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative
classification techniques with regards to the above criteria. Finally, future directions in associative
classification, such as incremental learning and mining low-quality data sets, are also
highlighted in this paper
Mining Event Logs to Support Workflow Resource Allocation
Workflow technology is widely used to facilitate the business process in
enterprise information systems (EIS), and it has the potential to reduce design
time, enhance product quality and decrease product cost. However, significant
limitations still exist: as an important task in the context of workflow, many
present resource allocation operations are still performed manually, which are
time-consuming. This paper presents a data mining approach to address the
resource allocation problem (RAP) and improve the productivity of workflow
resource management. Specifically, an Apriori-like algorithm is used to find
the frequent patterns from the event log, and association rules are generated
according to predefined resource allocation constraints. Subsequently, a
correlation measure named lift is utilized to annotate the negatively
correlated resource allocation rules for resource reservation. Finally, the
rules are ranked using the confidence measures as resource allocation rules.
Comparative experiments are performed using C4.5, SVM, ID3, Na\"ive Bayes and
the presented approach, and the results show that the presented approach is
effective in both accuracy and candidate resource recommendations.Comment: T. Liu et al., Mining event logs to support workflow resource
allocation, Knowl. Based Syst. (2012), http://dx.doi.org/
10.1016/j.knosys.2012.05.01
- …