652,688 research outputs found
Efficient Analysis of Pattern and Association Rule Mining Approaches
The process of data mining produces various patterns from a given data
source. The most recognized data mining tasks are the process of discovering
frequent itemsets, frequent sequential patterns, frequent sequential rules and
frequent association rules. Numerous efficient algorithms have been proposed to
do the above processes. Frequent pattern mining has been a focused topic in
data mining research with a good number of references in literature and for
that reason an important progress has been made, varying from performant
algorithms for frequent itemset mining in transaction databases to complex
algorithms, such as sequential pattern mining, structured pattern mining,
correlation mining. Association Rule mining (ARM) is one of the utmost current
data mining techniques designed to group objects together from large databases
aiming to extract the interesting correlation and relation among huge amount of
data. In this article, we provide a brief review and analysis of the current
status of frequent pattern mining and discuss some promising research
directions. Additionally, this paper includes a comparative study between the
performance of the described approaches.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1312.4800; and with arXiv:1109.2427 by other author
Effective pattern discovery for text mining
Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance
Towards Distributed Convoy Pattern Mining
Mining movement data to reveal interesting behavioral patterns has gained
attention in recent years. One such pattern is the convoy pattern which
consists of at least m objects moving together for at least k consecutive time
instants where m and k are user-defined parameters. Existing algorithms for
detecting convoy patterns, however do not scale to real-life dataset sizes.
Therefore a distributed algorithm for convoy mining is inevitable. In this
paper, we discuss the problem of convoy mining and analyze different data
partitioning strategies to pave the way for a generic distributed convoy
pattern mining algorithm.Comment: SIGSPATIAL'15 November 03-06, 2015, Bellevue, WA, US
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
Reductions for Frequency-Based Data Mining Problems
Studying the computational complexity of problems is one of the - if not the
- fundamental questions in computer science. Yet, surprisingly little is known
about the computational complexity of many central problems in data mining. In
this paper we study frequency-based problems and propose a new type of
reduction that allows us to compare the complexities of the maximal frequent
pattern mining problems in different domains (e.g. graphs or sequences). Our
results extend those of Kimelfeld and Kolaitis [ACM TODS, 2014] to a broader
range of data mining problems. Our results show that, by allowing constraints
in the pattern space, the complexities of many maximal frequent pattern mining
problems collapse. These problems include maximal frequent subgraphs in
labelled graphs, maximal frequent itemsets, and maximal frequent subsequences
with no repetitions. In addition to theoretical interest, our results might
yield more efficient algorithms for the studied problems.Comment: This is an extended version of a paper of the same title to appear in
the Proceedings of the 17th IEEE International Conference on Data Mining
(ICDM'17
Constraint-based Sequential Pattern Mining with Decision Diagrams
Constrained sequential pattern mining aims at identifying frequent patterns
on a sequential database of items while observing constraints defined over the
item attributes. We introduce novel techniques for constraint-based sequential
pattern mining that rely on a multi-valued decision diagram representation of
the database. Specifically, our representation can accommodate multiple item
attributes and various constraint types, including a number of non-monotone
constraints. To evaluate the applicability of our approach, we develop an
MDD-based prefix-projection algorithm and compare its performance against a
typical generate-and-check variant, as well as a state-of-the-art
constraint-based sequential pattern mining algorithm. Results show that our
approach is competitive with or superior to these other methods in terms of
scalability and efficiency.Comment: AAAI201
- …
