783 research outputs found
Reductions for Frequency-Based Data Mining Problems
Studying the computational complexity of problems is one of the - if not the
- fundamental questions in computer science. Yet, surprisingly little is known
about the computational complexity of many central problems in data mining. In
this paper we study frequency-based problems and propose a new type of
reduction that allows us to compare the complexities of the maximal frequent
pattern mining problems in different domains (e.g. graphs or sequences). Our
results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader
range of data mining problems. Our results show that, by allowing constraints
in the pattern space, the complexities of many maximal frequent pattern mining
problems collapse. These problems include maximal frequent subgraphs in
labelled graphs, maximal frequent itemsets, and maximal frequent subsequences
with no repetitions. In addition to theoretical interest, our results might
yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in
the Proceedings of the 17th IEEE International Conference on Data Mining
(ICDM'17
A framework for visualizing association mining results
Association mining is one of the most used data mining tech- niques due to interpretable and actionable results. In this study we pro- pose a framework to visualize the association mining results, specifically frequent itemsets and association rules, as graphs. We demonstrate the applicability and usefulness of our approach through a Market Basket Analysis (MBA) case study where we visually explore the data mining results for a supermarket data set. In this case study we derive several interesting insights regarding the relationships among the items and sug- gest how they can be used as basis for decision making in retailing
Applications of concurrent access patterns in web usage mining
This paper builds on the original data mining and modelling research which has proposed the discovery of novel structural relation patterns, applying the approach in web usage mining. The focus of attention here is on concurrent access patterns (CAP), where an overarching framework illuminates the methodology for web access patterns post-processing. Data pre-processing, pattern discovery and patterns analysis all proceed in association with access patterns mining, CAP mining and CAP modelling. Pruning and selection of access pat-terns takes place as necessary, allowing further CAP mining and modelling to be pursued in the search for the most interesting concurrent access patterns. It is shown that higher level CAPs can be modelled in a way which brings greater structure to bear on the process of knowledge discovery. Experiments with real-world datasets highlight the applicability of the approach in web navigation
SemGrAM - Integrating semantic graphs into association rule mining
To date, most association rule mining algorithms
have assumed that the domains of items are either
discrete or, in a limited number of cases, hierarchical,
categorical or linear. This constrains the search for
interesting rules to those that satisfy the specified
quality metrics as independent values or as higher
level concepts of those values. However, in many
cases the determination of a single hierarchy is not
practicable and, for many datasets, an item’s value
may be taken from a domain that is more conveniently
structured as a graph with weights indicating
semantic (or conceptual) distance. Research in the
development of algorithms that generate disjunctive
association rules has allowed the production of
rules such as Radios V TVs -> Cables. In many
cases there is little semantic relationship between
the disjunctive terms and arguably less readable
rules such as Radios V Tuesday -> Cables can
result. This paper describes two association rule
mining algorithms, SemGrAMG and SemGrAMP,
that accommodate conceptual distance information
contained in a semantic graph. The SemGrAM
algorithms permit the discovery of rules that include
an association between sets of cognate groups of
item values. The paper discusses the algorithms, the
design decisions made during their development and
some experimental results.Sydney, NS
- …